Everything Is Numbers
Modern AI does things that would have seemed impossible for a computer to do a few years ago. It writes working programs from a description, creates photorealistic images from a sentence, and holds long conversations that genuinely seem to follow what you're saying. Underneath all of it is math. This chapter is about what kind of math, and why a foundation this simple can do so much.
What Can AI Do?
A few years ago, the things AI does routinely now would have read as science fiction. It can translate between hundreds of languages, generate a photorealistic image from a descriptions, write complex computer programs from scratch, or perform expert tasks better than most experts. The goal of this tutorial is to give you a deep intuitive sense of how AI does these things.
The whole tutorial rests on four ideas that the rest of this chapter introduces:
- Everything inside an AI is numbers. Words, images, sound.
- An AI is a function: numbers go in, different numbers come out.
- The function is itself made of numbers. Billions of them, called parameters.
- Training an AI means finding the right values for those numbers.
Most of the tutorial is spent on the fourth point, because that's where the interesting complications live.
Everything Is Numbers
You probably think of a text message as letters, a photo as colors, a song as sound. Inside a computer, all of them are lists of numbers.
- Text is numbers. Every character has a number assigned to it. The letter "H" is one number, "e" is another, the space between words is yet another. Your text message is a sequence of numbers.
- Colors are numbers. Every color you've ever seen can be described by three numbers: how much red, green, and blue light to mix. (Or how much cyan, magenta, and yellow ink, if you're painting.) Change those three numbers and you get a different color.
- Images are numbers. Every pixel is a color, and every color is three numbers. A photo is a giant grid of number-triplets.
- Sound is numbers. A microphone samples the air pressure thousands of times per second. Each sample is one number. A three-minute song is about 8 million numbers.
Throughout this tutorial, the boxes that look like the one below are interactive playgrounds. Click the tabs, type things in, drag the sliders. The point is to mess around with them.
Key Insight
Every input a computer ever sees is a list of numbers. Every output it produces is also a list of numbers. So every kind of computer thinking is the act of turning one list of numbers into another.
Thinking Is a Function
Once you accept that everything is numbers, something useful follows. Every kind of intelligent behavior is a function: something that takes numbers in and produces numbers out.
- "Is this a cat?" takes pixel values and returns a number representing how confident it is.
- "Translate to French" takes English character codes and returns French character codes.
- "Write a poem about rain" takes prompt characters and returns poem characters.
- "Generate an image of a sunset" takes text and returns pixel values.
The shape is the same in every case: numbers in, numbers out.
So all of AI reduces to one problem: find the right function.
If you could find the function that maps photo pixels to the word "cat", that's image recognition. If you could find the function that maps English to French, that's translation. If you could find the function that maps prompts to good answers, that's a chatbot.
But how do you find a function?
Machines with Knobs
The trick is to build a model: a machine with adjustable knobs (called parameters). Different knob settings make the machine compute different functions. Finding the function you want becomes a matter of finding the right knob settings.
Here's a concrete example. The model below has four parameters (a, b, c, d) and computes:
y = a + b·x + c·x² + d·x³
There's a target curve (the blue one) we want the model to match. Your job is to adjust the four knobs so the red curve sits on top of the blue one.
Key Insight
A model is a machine with adjustable knobs. Different settings compute different functions. Finding the right function becomes a matter of finding the right knob settings. With four knobs, hand-tuning is already frustrating. Real AI models have billions. We'll need a way to set them automatically.
Can One Machine Compute Everything?
Before going further, a question worth asking: are there limits to what functions a machine can compute?
In 1936, two mathematicians, Alan Turing and Alonzo Church, independently arrived at the same answer. You can build a machine capable of computing any function that can be computed at all. Not just math. Anything. This idea is called the Church-Turing thesis. It can't be formally proven, but no one has ever found a counterexample.
So if "thinking" is computation (turning numbers into numbers), then a computer can in principle do anything a human mind can do. The question isn't whether the right function exists. The question is whether we can find it.
The Lazy Solution: A Giant Lookup Table
Here's a thought experiment. Forget about clever math. What if we just made a giant table? For every possible input, store the correct output.
Want image recognition? Build a table with one row for every possible image, and write down what's in it. Want translation? A row for every possible English sentence, paired with its French translation.
This sounds crazy, but it's worth taking seriously. The number of possible inputs is always finite. A 512×512 image has a fixed number of pixels, and each pixel has a fixed number of possible colors, so there's a definite, countable number of possible 512×512 images. A text message has a maximum length, and each character comes from a finite set, so the number of possible messages is also finite. In principle, you could list every one and write the correct answer next to it.
Problem solved, right?
Not quite. Look at how big that table needs to be:
The lookup table approach fails completely. Even for a tiny 8×8 grayscale image (smaller than the smallest image used in real AI), the number of possible inputs is 256⁶⁴: a number with over 150 digits. That already dwarfs the number of atoms in the observable universe. A real photo has millions of pixels. You could never build a table that big. You could never even count that high.
The Real Challenge
So here's where this leaves us:
- Everything is numbers. Text, images, sound, all of it.
- Thinking is a function. Every intelligent task is numbers in, numbers out.
- Models are machines with knobs. Different settings compute different functions.
- The right function exists. The Church-Turing thesis says so.
- Brute force is impossible. A lookup table would be bigger than the universe.
What's needed are models that compress: capture the right pattern with far fewer parameters than a lookup table would need entries. And a way to find the right parameter values automatically rather than by hand.
That process of automatically finding the right knob settings is called learning, and the rest of this tutorial is about how it works.
In the next chapter we'll see the surprising answer: you don't need to design the solution. You only need a way to make reliable tiny improvements, and then to make a lot of them, very fast. The same algorithm behind biological evolution, A/B testing, and the scientific method can find the right parameter values automatically. It's called incremental optimization, and it's the engine behind all of modern AI.
Try it in PyTorch — Optional
Encode text, images, and sound as PyTorch tensors, visualize a parameterized function with different knob settings, and calculate why lookup tables are impossibly large.