A pretrained model knows a great deal about the world and nothing about your company. The two standard ways to close that gap are retrieval-augmented generation and fine-tuning. Teams reach for fine-tuning far more often than they should, usually because it sounds more serious. In practice, RAG handles the majority of "make the model know our stuff" problems at a fraction of the cost. Knowing which to use is one of the highest-leverage decisions in an LLM project.

What RAG does

Retrieval-augmented generation does not change the model at all. Instead, when a question comes in, it finds the most relevant chunks of your documents and slips them into the prompt, so the model answers using information it can see right in front of it.

Hand-drawn diagram of retrieval-augmented generation: a query retrieving relevant document chunks from a vector store, which are added to the prompt before the model answers.
RAG, one card from the deck

The pipeline is: split your documents into chunks, convert each into a vector that captures its meaning, and store them. At query time, convert the question into a vector too, find the chunks whose vectors are closest, and add them to the prompt. The model never memorized your documents; it looks them up live, every time.

This is why RAG is the default for anything involving private or changing knowledge: support docs, product information, policies, research. Update a document and the system uses the new version on the next query, with no retraining.

What fine-tuning does

Fine-tuning takes the pretrained model and trains it further on your own examples, actually adjusting its weights. The knowledge or behavior becomes baked into the model rather than supplied at query time.

Fine-tuning shines when you need to change how the model responds rather than what facts it has access to: matching a specific format or house style, handling a specialized domain language, or improving reliability on a narrow, repeated task. It is also useful for teaching skills that are hard to specify in a prompt.

The decision

The cleanest way to choose is to ask what kind of gap you are closing.

  • Knowledge that changes, or must be cited? Use RAG. Facts live in documents you can update, and you can show the source.
  • Behavior, format, tone, or a narrow repeated skill? Fine-tuning is the better fit.
  • Facts the model gets wrong but that rarely change? Either can work; RAG is usually cheaper to build and maintain.
  • Need both fresh knowledge and a specific style? Use both. Fine-tune for behavior, RAG for knowledge. They are complementary, not competing.

Cost and maintenance

RAG is cheaper to start and to keep running. There is no training run, updates are just document changes, and the main ongoing cost is the retrieval infrastructure and the extra tokens you add to each prompt. Fine-tuning has real upfront cost in data preparation and training, and every time the underlying base model improves you may want to redo it. A common and expensive mistake is fine-tuning to inject facts that change monthly, then having to retrain constantly. That is a RAG problem wearing a fine-tuning costume.

A simple rule of thumb

Start with good prompting. If the model lacks knowledge, add RAG. Only reach for fine-tuning when you need to change behavior that prompting and retrieval cannot fix. Most projects never need the third step, and the ones that do usually need it on top of RAG, not instead of it.

The full picture, drawn out

RAG, fine-tuning, LoRA, RLHF, and the rest of the adaptation toolkit are covered in the LLM Flashcards: 180 hand-drawn cards across the whole LLM stack.

See the deck

Related reading