Visual reference deck

LLM Flashcards: learn how language models actually work.

A hand-drawn deck of 180 cards covering the full LLM stack, from tokenization and attention through RAG, agents, and inference. One concept per card. A clean diagram, a short explanation, and nothing you have to wade through. Built by a working LLM research lab, not a content farm.

180hand-drawn cards
19topics
PDF + Ankiboth included

Get the deck See what’s inside

LLM Flashcards cover showing hand-drawn cards on self-attention, query-key-value vectors, and the KV cache.

A few sample cards

Every card is self-contained: a diagram you could redraw on a whiteboard, plus a few sentences on what it means and why it matters. Scroll to browse →

Hand-drawn flashcard explaining the transformer architecture: input embedding through self-attention, add and norm, feed-forward, and add and norm, with residual skip connections, repeated N times.
What is a Transformer?
Hand-drawn flashcard explaining Retrieval-Augmented Generation: a query retrieving relevant document chunks from a vector store, added to the prompt before the model answers.
What is RAG?
Hand-drawn flashcard explaining RoPE, rotary position embedding, which encodes token position by rotating query and key vectors.
RoPE position embeddings
Hand-drawn flashcard explaining the KV cache: storing key and value tensors during autoregressive generation so they are not recomputed each step.
KV cache at inference
Hand-drawn flashcard explaining the ReAct agent framework: a loop of thought, action, and observation that lets a model use tools across multiple steps.
The ReAct agent loop
Hand-drawn flashcard explaining Mixture of Experts: a router sending each token to a small subset of expert feed-forward networks.
Mixture of Experts
Hand-drawn flashcard explaining Chain-of-Thought prompting: asking a model to reason step by step before answering, improving accuracy on multi-step problems.
Chain-of-Thought
Hand-drawn flashcard explaining Byte Pair Encoding tokenization: merging the most frequent character pairs into subword tokens.
Byte Pair Encoding
Hand-drawn flashcard explaining quantization: representing model weights with fewer bits to shrink memory and speed up inference.
What is quantization?
Hand-drawn flashcard explaining hallucination in language models: confident but factually wrong output, and why it happens.
Hallucination

What’s inside

180 cards across 19 topics, ordered so each one builds on the last. Read straight through to build a mental model from first principles, or jump to the topic you need.

  1. I.Transformer architecture. Attention, FFN, layer norm, residuals, the full block. 27 cards
  2. II.Tokenization. BPE, vocabularies, special tokens, why numbers break. 8 cards
  3. III.Embeddings. Vectors, similarity, positional encoding, RoPE. 7 cards
  4. IV.Training fundamentals. Objectives, loss, batching, the language-modeling task. 10 cards
  5. V.Fine-tuning. SFT, LoRA, adapters, when to tune versus prompt. 9 cards
  6. VI.RLHF & alignment. Reward models, PPO, DPO, preference data. 11 cards
  7. VII.Prompting. Few-shot, chain-of-thought, self-consistency, structure. 11 cards
  8. VIII.Retrieval-augmented generation. Chunking, vector search, RAG end to end. 12 cards
  9. IX.Agents & tools. Function calling, ReAct, planning, multi-agent systems. 11 cards
  10. X.Inference & decoding. KV cache, sampling, speculative decoding, latency. 11 cards
  11. XI.Scaling laws. Chinchilla, compute-optimal training, emergence. 7 cards
  12. XII.Model architecture variants. MoE, MQA, ALiBi, SwiGLU, RMSNorm. 8 cards
  13. XIII.Quantization & efficiency. INT8/INT4, GPTQ, AWQ, GGUF, distillation. 7 cards
  14. XIV.Evaluation & benchmarks. Perplexity, MMLU, LLM-as-judge, contamination. 8 cards
  15. XV.Context management. Lost in the middle, compression, token budgets. 6 cards
  16. XVI.Safety & ethics. Hallucination, bias, refusal, memorization. 8 cards
  17. XVII.APIs & practical. Chat completion, streaming, cost, rate limits. 6 cards
  18. XVIII.Multimodal & advanced. Vision-language models, CLIP, synthetic data. 6 cards
  19. XIX.Reasoning. Multi-hop, scratchpads, program-aided, test-time compute. 7 cards

Who it’s for

  • Engineers working with LLMs who want a clean visual reference to keep open while reading papers or model cards.
  • Anyone preparing for an AI or ML engineering interview. The cards map to the concepts that actually come up: transformer internals, attention variants, KV cache, RAG, fine-tuning, inference trade-offs. Revise them with the Anki deck in the weeks before.
  • Students in NLP or deep-learning courses who think better through diagrams than dense paragraphs.
  • Self-taught learners who have used an LLM API and want to understand what is actually happening underneath.

How to use it

  • Read straight through, topic by topic, to build a foundation.
  • Import the Anki deck and review on your commute with spaced repetition.
  • Print four cards per page for physical study, or a single card full size as a poster.
  • Keep the PDF open as a reference while reading papers or model cards.

Questions

What formats do I get?

Two. A multi-page PDF organized by topic, high-resolution enough to print four cards per page, and an Anki deck (.apkg) you can import for spaced-repetition review on desktop or mobile.

Are these for beginners or experts?

Both, but most useful in the middle. If you have used an LLM API and want to understand what is happening underneath, this is built for you. The diagrams are approachable, but the technical depth assumes some ML background.

Is it good for interview prep?

Yes. The cards cover the concepts that come up in LLM and ML engineering interviews, and the Anki deck makes them easy to revise with spaced repetition in the weeks before.

How often does the deck update?

New cards are added as new techniques and research land. Past buyers get every update free, with no expiry and no resubscription.

Who made it?

The deck started as study notes inside LLMs Research, an independent applied research lab publishing on KV cache compression, adaptive compute, and multi-agent systems. It grows alongside the research work. You can read more about the lab.

Get the deck