tinkering and writing about startups and ML

How to get up to speed with LLMs

Machine learning moves super fast. I took a break from ML shortly after the Attention is All You Need paper dropped and unlocked the level of ML performance across language, image, and other applications.

To get caught back up, I went "sponge mode" and started a big personal wiki of research. I thought it would be useful to clean up that wiki and share some of it here for others. Think of it as a sequence break in the learning process to help you learn a little faster.

This focuses on large language models and natural language processing, because I personally have more interest in those fields. Some content applies generically to all deep learning, though.


  1. Getting up to speed ASAP
  2. The ML paper hit list
  3. Types of LLM agents
  4. Tuning models to create your own agents
  5. Email newsletters about AI

Getting up to speed ASAP

If you want to read just one section of this post, just read this one.

The most basic piece of neural networks is gradient descent and backpropagation. This relies on basic calculus and how you can chain together the gradients of a computational graph. Andrej Karpathy, a co-founder at OpenAI, has the excellent Neural Networks: Zero to Hero Youtube playlist, which starts out with auto-gradient calculation, and goes on to cover language modeling, GPT models, audio generation, and some high-level state-of-the-art on how OpenAI trains GPT.

The most used deep learning framework is Pytorch. You should be familiar with it. The Hugging Face Transformers library has a Pytorch implementation and is a great way to work with transformer models. You could also choose to learn Tensorflow or JAX + Flax, but Pytorch is the most common deep learning framework.

There are a variety of LLM models:

The ML paper hit list

For the modern ML revolution, you must read Attention Is All You Need, which outlines the transformer architecture. The transformer architecture unlocked the current revolution of deep learning.

Other good papers:

If you need to catch up on some pre-transformer ML, this blog post by Alpin has some good background reading recommendations:

For language tasks, transformers have taken over and replaced RNNs. For visual tasks, vision transformers are competing with CNNs and outperform them in some large-data scenarios, but underperform in other settings. The forefront of image generation uses diffusion, some of which use transformers blocks.

Types of LLM agents

Here are a few of the most common types of LLM agents. Some of them overlap each other.

  • chatbot :: Talk to an AI and it talks back to you. Sometimes for entertainment, sometimes for informative Q&A.
  • co-pilot :: Runs alongside human-operated tasks and assists the human user. For example, Github Copilot helps you code by suggesting code snippets for you.
  • information retrieval :: Search your database and documents via natural language. ask questions, summarize a conversation, etcetera.
  • action agent :: Designed to execute tasks, mostly via API calls to other services (see Langchain page on agent types).

Tuning models to create your own agents

I think the future will be dominated by two types of AIs:

  1. a few massive base models, such as OpenAI's GPT, Anthropic's Claude, and Google's Bard
  2. a ton of fine-tuned models that extend these base models for specific tasks

Some folks think that there will be one massive super-AGI to rule them all. I disagree. When you fine-tune a model to a task, you can improve its speed, cost, and accuracy. That said, these models will extend powerful base models, because they need all that the latent features buried within the massive model.

So how do you fine tune an LLM? There are a few main ways of specializing an LLM:

  • prompt prefixes, one-shot prompting, and few-shot prompting
  • native fine tuning
  • Low-Rank Adaptation (LoRA)
  • Retrieval-Agumented Generation (RAG)

Prompt prefixes, one-shot prompting, and few-shot prompting

It almost feels like cheating to call this fine tuning, but it accomplishes the same goal. If you provide prompt instruction prefixes or examples to the LLM before passing in your user prompt, it conditions the kind of output that the LLM will produce.

With prompt prefixes, you precondition the LLM with specific instructions or task descriptions to condition the model response. One-shot and few-shot prompting is a subset of prompt prefixes where you provide either one (one-shot) or a few (few-shot) input-response pairs. Think of this like a micro training set.

Here's a prompt prefix example that conditions the style of an LLM output. Compare these two prefixes:

  • "You are a machine learning expert teaching graduate level machine learning."
  • "You are explaining machine learning to a elementary school kid."

And then ask the LLM "What is the BERT model?" The output will be very different and tailored to the audience type you specified.

Here's an example of few-shot prompting used for sentiment analysis:

# Context: "Classify the following text's sentiment. Just provide the label."

# Examples:

"This product broke one week after I bought it." : negative

"I bought these shoes for a hike last year, and they were great. I didn't even have to break them in and they held up well over many future hikes." : positive

"The food wasn't too expensive and the portions were large. I'd eat here if I lived next door, but I wouldn't make a trip just for the food." : neutral

# Input:

Review: ${insert_review_here}

Output: The sentiment of the given product review is:

You can also structure your few-shot instructions in an XML format, since LLMs consume a lot of XML and HTML.

One/few-shot prompting works very well for classification tasks, but it can also be used for arbitrary text generation tasks.

Native fine tuning

With true fine-tuning, you take an existing LLM model and you provide it a number of new training examples. It will fine tune across all of the weights of the network. You tend to need a large training dataset (on the order of 100s of MiB of data) and a powerful computer to run the tuning process.

Low-Rank Adaptation (LoRA) fine tuning

LoRA stands for Low Rank Adaptation and it lets you fine-tune a model quicker and with less data. It uses a pair of rank-decomposition weight matrices, called "update matrices", and freezes the weights of the parent network. These update matrices approximate the weights of the full LLM. We then only train the update matrices. This has some benefits:

  • training is cheaper and faster
  • you can share a pre-trained base model among multiple LoRA fine-tunings, which is substantially cheaper from a hosting perspective
  • help avoid catastrophic forgetting on tasks outside of the fine-tuning dataset

Here is a LoRA example from Hugging Face, using their PEFT library.

Retrieval-Augmented Generation (RAG)

Similar to prompt prefixes, retrieval-augmented generation doesn't tune any weights, but it preconditions the LLM's output by providing it with some prefixed text. Unlike simple prompt prefixing and few-shot prompting, RAG finds documents related to the user's prompt and then feeds it into the LLM as the input prefix. It works like this:

  1. The user enters their prompt
  2. The system takes the prompt and finds related documents from a document or vector DB (via embedding similarity search, keyword search, and other information retrieval methods)
  3. The system takes relevant document chunks and inserts them as an input prefix into the LLM

This combines the fields of information retrieval (search) with natural language generation. You end up with LLM responses that are specific to a dataset and tend to be more accurate against that data.

Email newsletters

The AI field changes constantly. You need to keep feeding yourself training data (booo! bad joke) to keep up to date. Here are some newsletters that I like. Good to skim and see if anything interesting jumps out at you.

  • TLDR AI - high level summary of new products, papers, and news
  • Supervised - a more detailed dive into a specific topic per post (paid)

Subscribe to Dylan Mikus

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.