LLM Flashcards: learn how language models actually work.
A hand-drawn deck of 180 cards covering the full LLM stack, from tokenization and attention through RAG, agents, and inference. One concept per card. A clean diagram, a short explanation, and nothing you have to wade through. Built by a working LLM research lab, not a content farm.
A few sample cards
Every card is self-contained: a diagram you could redraw on a whiteboard, plus a few sentences on what it means and why it matters. Scroll to browse →
What’s inside
180 cards across 19 topics, ordered so each one builds on the last. Read straight through to build a mental model from first principles, or jump to the topic you need.
- I.Transformer architecture. Attention, FFN, layer norm, residuals, the full block. 27 cards
- II.Tokenization. BPE, vocabularies, special tokens, why numbers break. 8 cards
- III.Embeddings. Vectors, similarity, positional encoding, RoPE. 7 cards
- IV.Training fundamentals. Objectives, loss, batching, the language-modeling task. 10 cards
- V.Fine-tuning. SFT, LoRA, adapters, when to tune versus prompt. 9 cards
- VI.RLHF & alignment. Reward models, PPO, DPO, preference data. 11 cards
- VII.Prompting. Few-shot, chain-of-thought, self-consistency, structure. 11 cards
- VIII.Retrieval-augmented generation. Chunking, vector search, RAG end to end. 12 cards
- IX.Agents & tools. Function calling, ReAct, planning, multi-agent systems. 11 cards
- X.Inference & decoding. KV cache, sampling, speculative decoding, latency. 11 cards
- XI.Scaling laws. Chinchilla, compute-optimal training, emergence. 7 cards
- XII.Model architecture variants. MoE, MQA, ALiBi, SwiGLU, RMSNorm. 8 cards
- XIII.Quantization & efficiency. INT8/INT4, GPTQ, AWQ, GGUF, distillation. 7 cards
- XIV.Evaluation & benchmarks. Perplexity, MMLU, LLM-as-judge, contamination. 8 cards
- XV.Context management. Lost in the middle, compression, token budgets. 6 cards
- XVI.Safety & ethics. Hallucination, bias, refusal, memorization. 8 cards
- XVII.APIs & practical. Chat completion, streaming, cost, rate limits. 6 cards
- XVIII.Multimodal & advanced. Vision-language models, CLIP, synthetic data. 6 cards
- XIX.Reasoning. Multi-hop, scratchpads, program-aided, test-time compute. 7 cards
Who it’s for
- Engineers working with LLMs who want a clean visual reference to keep open while reading papers or model cards.
- Anyone preparing for an AI or ML engineering interview. The cards map to the concepts that actually come up: transformer internals, attention variants, KV cache, RAG, fine-tuning, inference trade-offs. Revise them with the Anki deck in the weeks before.
- Students in NLP or deep-learning courses who think better through diagrams than dense paragraphs.
- Self-taught learners who have used an LLM API and want to understand what is actually happening underneath.
How to use it
- Read straight through, topic by topic, to build a foundation.
- Import the Anki deck and review on your commute with spaced repetition.
- Print four cards per page for physical study, or a single card full size as a poster.
- Keep the PDF open as a reference while reading papers or model cards.
Questions
What formats do I get?
Two. A multi-page PDF organized by topic, high-resolution enough to print four cards per page, and an Anki deck (.apkg) you can import for spaced-repetition review on desktop or mobile.
Are these for beginners or experts?
Both, but most useful in the middle. If you have used an LLM API and want to understand what is happening underneath, this is built for you. The diagrams are approachable, but the technical depth assumes some ML background.
Is it good for interview prep?
Yes. The cards cover the concepts that come up in LLM and ML engineering interviews, and the Anki deck makes them easy to revise with spaced repetition in the weeks before.
How often does the deck update?
New cards are added as new techniques and research land. Past buyers get every update free, with no expiry and no resubscription.
Who made it?
The deck started as study notes inside LLMs Research, an independent applied research lab publishing on KV cache compression, adaptive compute, and multi-agent systems. It grows alongside the research work. You can read more about the lab.