Interviews for roles that touch large language models have settled into a recognizable shape. They are less about reciting paper titles and more about whether you understand how the pieces fit together and where the trade-offs are. If you can explain why the KV cache exists or when RAG beats fine-tuning, in your own words, you are most of the way there.
What actually gets asked
Questions tend to cluster into a few areas. You do not need depth in all of them for every role, but a competent answer in each is what separates a strong candidate from a shaky one.
- Transformer fundamentals. What attention does, what query, key, and value are, why residual connections matter, what the feed-forward layer is for.
- Attention variants and efficiency. Multi-head versus grouped-query versus multi-query attention, and why the differences matter for memory.
- Tokenization and embeddings. What a token is, why BPE is used, what an embedding represents, how position is encoded.
- Training and alignment. Pretraining versus fine-tuning, what RLHF and DPO do, what LoRA is.
- Retrieval and agents. How RAG works end to end, when to use it, what function calling and agent loops add.
- Inference. The KV cache, sampling and temperature, speculative decoding, and what drives latency and cost.
- Judgment questions. "A team wants to do X, how would you approach it?" These test whether you can reason about trade-offs, not recall facts.
How to study, not just what
The most common mistake is passive review: reading explanations, nodding along, and mistaking familiarity for understanding. In an interview you have to produce the explanation, not recognize it. Two things fix this.
Explain it out loud. For each concept, try to explain it in two sentences as if to a colleague, without looking. If you cannot, you do not know it yet. This is the single best test of interview readiness.
Use spaced repetition. Concepts fade. Reviewing them on an expanding schedule, the way tools like Anki manage automatically, moves them into durable memory far more efficiently than cramming. A few minutes a day over two weeks beats one long session the night before.
A two-week plan
- Days 1 to 4. Fundamentals: transformers, attention, tokenization, embeddings. Get the mental model solid before anything else.
- Days 5 to 8. Training and adaptation: pretraining, fine-tuning, LoRA, RLHF, and RAG.
- Days 9 to 11. Inference and efficiency: KV cache, sampling, quantization, latency.
- Days 12 to 14. Review weak spots with spaced repetition, and practice the judgment questions out loud.
Why visual flashcards work for this
Diagrams are easier to recall under pressure than paragraphs. A clear picture of the transformer block or the RAG pipeline is something you can reconstruct on a whiteboard, which is often exactly what an interviewer asks you to do. That is the reason our deck exists: each concept is one diagram you can hold in your head and redraw.
A revision deck built for exactly this
The LLM Flashcards are 180 hand-drawn cards covering every topic above, with an Anki deck included so you can run spaced repetition straight away. Built by a working LLM research lab.
See the deck →