This page is available as Markdown at /blog/2026-06-11-guidelines-documentation-for-llms/index.md, or request this URL with Accept: text/markdown. Site index: /llms.txt. Sitemap: /sitemap.yaml
AI Software Engineering Vibe Coding

Guidelines and Documentation — Teaching LLMs Where to Go

Stefan Loesch|
Guidelines and Documentation — Teaching LLMs Where to Go
(image credit: Gemini)

This is part 3 of the "Does Vibe Coding Work?" series. part 1: Does Vibe Coding work? discussed why complexity is the fundamental challenge in LLM-assisted programming, and part 2: Testing and Linting discussed how automated checks catch violations after the fact.

The road taken so far

In this series we are talking about the benefits and the pitfalls of LLM-assisted software development, aka "vibe coding". In part 1 we introduced the concept and discussed the types of complexity that arise in any software project, and that mean that progress in the early stages tends to be very fast and later it slows down -- something which a lot of vibe coders who have never done end-to-end projects now experience. We also warned of the risk that LLMs, when left to their own devices, often produce pretty awful code, and there is a point beyond which they simply can no longer progress without skilled architectural oversight. This point can come surprisingly quickly for YOLO vibecoders -- weeks, possibly days.

We also identified a number of mitigants that allow keeping the vibe-coded project in shape. The first one that we discussed in part 2 was about testing, and linting. Tests simply make sure that the LLM does not break one thing when changing another. Or attempt to make sure I should have said. Having the LLM write effective tests is a whole problem of its own, and realistically in production systems you also want to do some simulated end-to-end user testing of the actual functionality provided, because even with passing tests there is room for the actual application to fail.

Linting is about how the code is written. It enforces a certain style and certain structures which makes it easier for the LLM to read and maintain. The real interesting part here is structural linting which is about enforcing certain architectural patterns. For example, you may want to forbid direct database access outside of a certain module, or you may want to enforce that all API calls go through a certain layer, or you may want to ensure that there are no "magic number" numeric literals in the codebase except in certain well-defined places.

So testing and linting allows to enforce certain rules -- if the LLM writes code that violates them then this gets caught, and the LLM is encouraged or forced to fix it. This is powerful, but this is only one side of the coin. Ideally the LLM does not have to engage in learning-by-doing, but rather will be told what to do in the first place and the testing and linting rules are there as a backstop should the LLM get ahead of itself as it does quite regularly.

Today's topic: documentation, documentation, documentation

There is another problem not quite unique to LLMs -- humans have it as well -- but arguably for LLMs it is more acute: LLMs are really bad at big picture views. By default, when they look at a codebase, they look at it locally, bottom up. This does not mean they cannot understand structure. In fact, they are quite good at extracting structure by looking at code bottom up. But what I have found at least is that they often don't really link the two.

For example, you can have an LLM analyze the whole codebase, and it is really good at understanding what the modules do and how they are organized, and how they work together. When it comes to programming however they often forget all of that, especially when this analysis appeared a bit further back in the context window, and they write their code with complete disregard for the overall structure. Of course one could ask them to review the entire code base when planning the next step, but this can be extremely expensive to do. The answer to that is in principle very simple: documentation, documentation, documentation.

Generally it will be the LLM who writes its own documentation. However, it does need some guidance how to do it and what should be in there. In my experience the most important rules are the following

  1. Hierarchical structure. Start with the big picture, then zoom in progressively.
  2. Architecture overviews. Describe apps, modules, their boundaries and APIs
  3. Declarative rules. State the rules and conventions you want the LLM to follow
  4. Maintenance. Make sure those rules are maintained and remain in line with the code.

We will now go through each of those points in detail.

Hierarchical structure

All information for LLMs should be hierarchical and structured -- context is limited and they can read the big picture first, and then pull in the details they need. Giving details only often leads to them reading the whole document, polluting context with unnecessary information which is expensive and degrades the result. This technique is more generally known as "progressive disclosure" and it is something that universally applies to all documents that are written for LLMs with the purpose of pulling information (as opposed to pushing information where you want the LLM to read everything pushed).

Generally we've found it more efficient to ask the LLM -- in our case mostly Claude -- to self-organize and to decide independently where information should go (memory, Claude.md, structured documentation folder, or one-off notes covering a topic of interest in depth, typically as part of the planning process for development or restructuring).

Documentation should be heavily cross-referenced, both vertically (table of contents, summaries) and horizontally (establishing relationships between different topics). Note that this documentation is not usually meant to be read by humans, at least not directly. If a human engages with it then generally via an LLM, asking questions and extracting summaries or specifics for certain topics.

Architecture overviews

This point is somewhat related to hierarchical structure, but it deserves its own section because it is fundamental to software development. The LLM needs to understand the units of design in the system and their boundaries. And often it will not be able to do it by looking at the code bottom up -- it often misses the forest for the trees.

For example, you may have a number of services defined in the same repo that launch within their own processes, and that are meant to communicate via REST APIs, or via Redis, or any other well-defined API mechanism. An LLM in its eagerness may find one function it needs in on of those services in another one, and if it can overcome its urge to reinvent the wheel it may just import it. That of course is the last thing it should do because suddenly the whole service is instantiated a second time, within the confines of the first process. If you have a stateless design where all state is stored in some database layer this may actually work perfectly well and you may not even notice the problem. At least not unless you wonder why you are running out of memory or database connections.

So the documentation should explain what are the services, what are their responsibilities and their boundaries, and how they are meant to interact. Big picture. Than, progressive disclosure, the services themselves may have their own architectural patterns to follow, for example one module for the API / server, one for database access, one for calling LLMs etc. Typically there are some common pattern in those modules, eg how to access the database, and other common utilities. Especially for LLMs it is important DRY out the code ("Don't Repeat Yourself") at the earliest possible opportunity, otherwise they'll reinvent the wheel over and over again. For this to have a fighting chance to work however the DRY patterns need to be clearly documented.

Declarative rules

Interestingly this rule has been provided by Claude itself who after a certain period of reflection stated that declarative rules are preferrable over descriptive rules. This is best explained by an example.

  • "We use SQLAlchemy for database access" is a descriptive rule, whilst
  • "All database calls must go through the SQLAlchemy ORM, no raw SQL!" is a declarative rule

On the surface those two rules are equivalent, and humans will generally be able to work with either, quickly establishing the declarative form from the descriptive one and remembering both. LLMs do not operate like that: going from a descriptive to a declarative rule is one extra step of thinking, and LLMs do not have an internal memory where they could store those rules. So either they need to derive the declarative rule over and over again, or they need to store it in some artificial memory.md file. In either case it is better to have the declarative rule available from the start.

For more complex cases it is worth including the reasoning why we have certain rules. To give an example: "always use snake_case for regular functions" does not need justification. This is just a convention. But "no lazy imports unless it avoids circular dependencies" does. Claude specifically seems to love lazy imports, but telling it that for long running services we want to see the memory pressure right away, and we don't want random import fails on uncommon paths seems to make it slightly less prone to do them. Of course the only thing that prevents Claude from creating lazy imports is a linting rule to that respect.

Maintenance

Last but not least, documentation needs to be maintained. Whenever the code changes the documentation needs to change with it -- and Claude is awful at remembering to do that. There are a number of strategies to mitigate this problem.

Firstly, one can put it in the rules that the LLMs needs to update the documentation whenever code changes. As with all rules this can be a bit of hit and miss, but it may work 30-60% of the time. However, there is a serious question to be asked whether this is token-efficient. In my experience, Claude is not particularly able to update the documentation from memory after the task and therefore updating documentation too often may consume a significant number of tokens, without commensurable benefit.

What I do therefore is to explicitly ask Claude to update the documentation after major changes, from time to time I ask it to go through everything to assert correctness, and there is a standing order committed to memory to correct the documentation immediately, whenever a discrepancy is found. This to me seems to be good compromise that balances token budget versus the usefulness of documentation, keeping in mind that a/ LLMs are much more capable than humans to quickly verify documentation against the actual codebase if things go wrong, and b/ arguably the key benefit of documentation for LLMs is provide the high level view which is not usually subject to sudden changes.

Conclusion

Testing and linting tell the LLM what it cannot do. Documentation tells it what it should do. Together they form two sides of the same coin: reactive enforcement and proactive guidance.

The key insight is that LLMs process code bottom-up but need to make decisions top-down. Without documentation that provides the big picture — the architecture, the boundaries, the conventions — they will write locally correct code that is globally incoherent. Hierarchical, declarative, well-maintained documentation bridges that gap. It is not usually meant to be read by humans directly — it is optimized along different axes — but LLMs can distill it on demand, and humans can query it through an LLM.

So in the context of our series of articles, at this point the LLM knows what it cannot do (part 2) and what it should do (this part) within the context of the current codebase. But neither addresses how the codebase itself should be structured to be optimally worked on by LLMs. This will be the topic of the upcoming part 4 where we discuss how to organize modules, enforce boundaries, and keep complexity from spreading — the architectural patterns that make everything else work.


This is part 3 of the "Does Vibe Coding Work?" series. Previously: part 1 on the complexity problem and part 2 on testing and linting. Up next: part 4 on code organization (coming soon).