Context Window
A context window is the maximum amount of text — measured in tokens — that an AI model can read and hold in working memory at one time, including the prompt, conversation history, and any retrieved documents.
Every AI language model has a fixed context window, typically ranging from a few thousand to over a million tokens (a token is roughly three-quarters of a word in English, and often fewer characters per token in Arabic script). Everything the model needs to "know" for a given response — system instructions, the full chat history, retrieved reference documents, and the user's latest message — must fit inside this window. Once a conversation exceeds it, the oldest content is typically dropped or summarized, which is why a long-running chatbot can suddenly seem to "forget" what was discussed earlier.
Context window size matters directly for cost and reliability: a larger window lets a WhatsApp AI agent hold a customer's entire order history and past complaints in view, but every extra token processed costs money and can slow the response. This is why well-built systems pair a moderate context window with a vector database and RAG: instead of stuffing every past ticket into the prompt, the system retrieves only the few most relevant pieces of context — for example, only the last unresolved delivery complaint from a returning customer in Riyadh — keeping responses fast, cheap, and focused.
Looking for Custom Advice?
Let us help you understand and implement these technologies tailored to your business goals.
Book a Discovery Call