Context Engineering: Yet Another Branch of "Engineering" Involving Language Models

Have we just rebranded prompt engineering? Like Apple did with âGlassâ? Well, turns out that there is more to context-engineering than writing lengthy prompts.
If prompt engineering was about cleverly wording a single query to coax a good answer, context engineering is about orchestrating everything the AI needs around that query. Ideally, it allows us to address the shortcomings of prompt engineering for tasks that require multi-step reasoning or developing answers scattered within various knowledge bases.
From Prompt Engineering to Context Engineering
Prompts by themselves couldnât provide memory of past interactions implicitly, up-to-date knowledge, or ensure the AI stayed on track over a long dialogue. Developers realized that many failures of LLMs (like irrelevant or made-up answers) werenât due to the model being âbadâ â it was because the system around the model didnât give it the right context or support. In other words, a clever prompt means little if itâs buried in a sea of irrelevant text or if the model lacks crucial information.
This insight gave rise to context engineering. Instead of focusing only on the prompt itself, context engineering is about designing the entire environment in which the AI operates. One article succinctly defines it as âstructuring everything an LLM needs to complete a task successfully.â As Andrej Karpathy described, if prompt engineering is like writing a single sentence, context engineering is like writing an entire screenplay for the AI. Youâre not just giving an instruction, youâre providing background, setting the scene, and making sure the AI has all the pieces to âget it right.â
Building Blocks of Context Engineering
So, what exactly goes into the engineering context for a language model? It typically means managing and curating a variety of inputs around the prompt, such as:
Conversation history and memory: Supplying relevant parts of prior chats or interactions so the model âremembersâ important details.
Relevant documents or data: Fetching knowledge from databases, files, or the web (often via retrieval techniques) to inform the answer.
Structured instructions and context format: Framing information in a model-friendly way â for example, providing clear system instructions, or formatting data as tables or JSON if needed. Tools and actions: Allowing the model to invoke external tools or APIs (like calculators, web search, or code execution) and feeding those results into the context
Tools and actions: Allowing the model to invoke external tools or APIs (like calculators, web search, or code execution) and feeding those results into the context.
In essence, context engineering is about filling the modelâs context window (its working memory) with the right information, at the right time, in the right format. A lot of devs would argue that theyâve been doing this since day one, and now weâve just labeled it.
Why Context Engineering is the Latest Trend?
The move toward context engineering comes from practical necessity. As developers built more âindustrial-strengthâ LLM applications. Context engineering delivers that by reducing guesswork. Instead of hoping a single prompt will make a model magically know about your 100-page knowledge base, you feed the knowledge base (or its highlights) into the modelâs context.
Instead of the model forgetting what was said 10 messages ago, you make sure to persist important history forward. Rather than just asking the LLM to perform a task blindly, you might give it step-by-step context or even let it call functions and then supply the results back into the conversation.
Key factors that are driving this trend:
New tooling libraries and frameworks, pushing whatâs possible in agentsâ memory.
Growing context windows allow devs to feed whole documents whenever needed, thus achieving a better utilization of the language modelâs context window.
Itâs a system, not a sentence! A well-engineered context means devs donât have to tailor prompts for every scenario.
Context engineering treats AI behavior as a systems design problem. Rather than solely relying on clever wording, it emphasizes architecting the information flow around the model.
What Do the Numbers Say About Context Engineering?
There is growing evidence that better results and stability can be achieved by setting the context for a language model, in contrast to a simple prompt.
A Databricks study evaluated retrievalâaugmented generation (RAG) using longâcontext models, such as GPT-4âturbo (128k tokens) and Claude 2 (200k tokens). They found that providing more retrieved documents increases answer quality, especially on QA tasks across corporate and financial domains.

However, models struggle if the context is too long (âlost in the middleâ), so properly chunking and curating context is critical [1].
Thomson Reuters benchmarked longâcontext LLMs (up to 1âŻM tokens) vs retrieval-based. Their internal tests revealed that feeding full legal documents into the LLM outperformed RAG in most document-based reasoning tasks. LC models handled multi-hop reasoning more effectively than RAG approaches [2].

Additionally, research from Nvidia presented at ICLR 2025 showed:
Best results come from combining retrieval and long-context capabilities.
A LLaMA2â70B with 32k context + retrieval matched or surpassed GPTâ3.5âturboâ16k and Davinci003.
The LC+RAG combo outperformed LC-only baselines, showing clear synergy [3].
Final Takeaway
With growing excitement around context setting (engineering), and quantitative proofs backing this excitement within the community, it proves that it isnât hype but is empirically superior for domain-specific knowledge work.
Language Models with large context windows shine when their context window utilization is higher, with targeted retrieval, where dump=everything strategies falter. Thus, backing the sentiment that feeding the model the right context, even via retrieval pipelines or using very long context data inputs, yields measurable gains over prompt-only methods.
It is about building a system, not a prompt.
References
Leng,âŻQ., Portes,âŻJ., Havens,âŻS., Zaharia,âŻM. and Carbin,âŻM. (2024) âLongâŻContextâŻRAG Performance ofâŻLLMsâ. DatabricksâŻBlog, 12âŻAugust. Available at: https://www.databricks.com/blog/long-context-rag-performance-llms (Accessed:âŻ12âŻJulyâŻ2025).
Hron,âŻJ. (2025) âLegalâŻAI Benchmarking: Evaluating LongâŻContext Performance forâŻLLMsâ. ThomsonâŻReuters InnovationâŻBlog, 14âŻApril. Available at: https://blogs.thomsonreuters.com/en-us/innovation/legal-ai-benchmarking-evaluating-long-context-performance-for-llms/ (Accessed:âŻ12âŻJulyâŻ2025).
Xu,âŻP., Ping,âŻW., Wu,âŻX., McAfee,âŻL., Zhu,âŻC., Liu,âŻZ., Subramanian,âŻS., Bakhturina,âŻE., Shoeybi,âŻM. and Catanzaro,âŻB. (2023) âRetrievalâŻMeets LongâŻContext LargeâŻLanguageâŻModelsâ. arXiv preprint arXiv:2310.03025. Available at: https://arxiv.org/abs/2310.03025 (Accessed:âŻ12âŻJulyâŻ2025).
đ„ Join the JigsawStack Community
Have questions or want to show off what youâve built? Join the JigsawStack developer community on Discord and X/Twitter. Letâs build something amazing together!





