Describing the World with Numbers
The word "vector" sounds intimidating. It isn't. A vector is just a list of numbers. But this simple idea turns out to be the key to understanding what neurons actually do, and why stacking them in layers works at all.
Describing Things with Vectors
You already use lists of numbers to describe things. Your position on the planet is two numbers: (latitude, longitude). A color on screen is three numbers: (red, green, blue). An RPG character has stats like (strength, magic, speed, defense). Each of these is a vector: a list of numbers that together describe something.
We can use the same idea to describe almost anything. Take animals: rate each one from 0 to 1 on properties like big, scary, hairy, cuddly, fast, and fat, and you get a vector that captures what makes that animal distinctive.
The number of measurements is the vector's dimension. Our animal vectors have 6 dimensions, one per property. More dimensions let you make finer distinctions. With just "big" and "scary," a bear and a dog might look similar. Add "cuddly" and they separate out.
The vector captures what we chose to measure. If we'd picked different properties, the same items would get different vectors. The choice of dimensions determines what the vector can distinguish.
Key Insight
A vector is a compact description. It's a list of numbers that together describe something. More dimensions mean finer distinctions. Vectors can describe positions, colors, characters, animals, foods, anything you can capture with numbers.
Measuring Similarity
How do we measure how similar two things are? There's a simple operation: multiply each matching pair of numbers, then add up all the results. This is called the dot product:
dot product = (a₁ × b₁) + (a₂ × b₂) + … + (aₙ × bₙ)
The result is a single number that captures how similar the two vectors are. A large positive number means they share a lot of the same properties. A number near zero means they have little in common. A negative number means they pull in opposite directions.
Why does multiply-and-add measure similarity? Look at the per-dimension math. When both vectors score high on the same property, multiplying those gives a big number, and that shared trait contributes a lot to the total. But if one scores high and the other low, multiplying gives almost nothing. Add up all these products, and you get a total that's high when the vectors agree and low when they don't.
The similarity scores here fall cleanly between −1 and 1. That's because all the animal, food, and instrument vectors have been carefully scaled to the same size. The next section explains why that matters, and what goes wrong when vectors aren't the same size.
Unit Vectors: Solving the "Big in Everything" Problem
What if we hadn't been careful about scaling? Imagine an animal that scored high on every property: big, scary, hairy, cuddly, fast, fat. Its dot product with any animal would be huge. It would look similar to everything, which is clearly wrong.
The problem is that the dot product rewards having big numbers, not just agreeing on the same properties. We need a way to put all vectors on equal footing.
The solution: scale every vector to the same size, specifically size 1. A vector with size 1 is called a unit vector.
To find the size of a vector (its magnitude), you square all the dimensions, add them together, and take the square root. This is just the Pythagorean rule extended to any number of dimensions:
magnitude = √(v₁² + v₂² + … + vₙ²)
To make a unit vector, divide every dimension by the magnitude. The result has magnitude 1, but the proportions between dimensions stay the same. A bear is still scarier than it is fast.
For any two unit vectors, the dot product is always between −1 and 1, with a clean interpretation:
- 1 means they're identical
- 0 means they have nothing in common
- −1 means they're completely opposite
All the animal, food, and instrument vectors we've been using are unit vectors, which is why their dot products gave such clean similarity scores.
Key Insight
A unit vector is a standard-size vector. Normalizing to magnitude 1 preserves proportions but removes the size advantage, enabling fair similarity comparisons. The dot product of two unit vectors gives a pure similarity score between −1 and 1. (This is sometimes called cosine similarity.)
Combining Vectors
Since vectors are just lists of numbers, you can add them. Add each dimension separately. The result is a new vector that combines information from both inputs.
Blending. Add two animal vectors and normalize the result back to a unit vector, and you get an animal "in between" the two. Blend a bear and a rabbit and you get something medium-sized, moderately cuddly, and somewhat hairy. Check the similarity ranking; the blend often lands near an animal you'd expect.
Modifying. You can also add a vector that pushes in a specific direction. Start with a bear and add "more cuddly, less scary, smaller", and you get something like Paddington Bear. Start with a dog and add "bigger, scarier", and you get a wolf. The result describes something that wasn't in our original list.
This idea, building up complex descriptions by adding simpler vectors together, turns out to be central to how neural networks work. This is something we'll return to in the chapter on embeddings and again for transformers.
What and How Much
Any vector can be split into two parts: a unit vector and a magnitude.
- The unit vector says what kind of thing it is, the pattern of properties, stripped of size.
- The magnitude says how much of that thing there is.
Think of a bear detector: the unit vector says "I'm looking for bear-like things," and the magnitude says "how sensitive is my detector to bears." A magnitude of 1 is the standard size. Double the magnitude, and the dot product with any input doubles too. The detector gets more sensitive without changing what it detects.
This decomposition (unit vector for what, magnitude for how much) is one of the most important ideas in the chapter. It's how neurons work, as we'll see next.
A Neuron Is a Dot Product
Remember from the previous chapter that a neuron computes a weighted sum of its inputs, adds a bias, then applies an activation function. That weighted sum is a dot product. The neuron has a weight vector and receives an input vector:
output = activation( w · x + bias )
where w = (w₁, w₂, …, wₙ) is the weight vector and x = (x₁, x₂, …, xₙ) is the input vector.
The neuron is asking "is this input what I'm looking for?".
- The unit vector of the weights says what kind of thing the neuron detects.
- The magnitude of the weights says how sensitive the neuron is, how much evidence it needs to get excited.
- The bias says how excited it is if it hasn't seen anything at all.
Try the Animals tab and set the weight to "Bear". Now you have a bear detector! Feed it different animals as input and watch the output. A dog gets a high score because it shares many bear-like properties. A mouse scores low. You can do the same with any category: set the weight to "Pizza" on the Foods tab and you have a pizza detector.
Key Insight
A neuron is a pattern detector. It computes the dot product of its input vector with its weight vector, adds a bias to shift the threshold, and squashes the result through an activation function. The weight vector's unit vector defines what the neuron looks for; its magnitude defines how sensitive it is.
Seeing It in 2D
Everything so far used lists of six numbers. But when a vector has just two dimensions, we can draw it as an arrow on a flat surface, and all the concepts we've learned become visible.
A 2D vector like (0.8, 0.6) becomes an arrow from the origin to the point (0.8, 0.6). The magnitude is the length of the arrow. A unit vector sits on the unit circle (the circle of radius 1). And the unit vector × magnitude decomposition we learned earlier is literally visible: the unit vector shows where the arrow points, and the magnitude shows how long it is.
The dot product becomes geometric too. When two arrows point the same direction, their dot product is large and positive. When they're perpendicular, it's zero. When they point in opposite directions, it's negative. You can see this directly:
This geometric picture will return in later chapters, especially when we look at how attention works and how positional information is encoded.
What's Next
A vector is just a list of numbers, but it's a powerful idea. We've seen how the dot product measures similarity, how unit vectors enable fair comparison, and how a neuron uses the dot product to detect patterns in its input.
But so far we've been describing things with hand-picked properties: big, scary, hairy. That works for animals, because animals are one kind of thing and we can choose sensible properties for them. But what about words? "Dog" and "democracy" and "purple" and "running". There's no single set of properties that makes sense for all of them. "How scary is the color blue?" isn't a meaningful question.
In the next chapter, we'll see how AI solves this problem: instead of choosing dimensions by hand, it learns vectors where similar words end up nearby and directions encode meaningful relationships, all without any human labeling.
Try it in PyTorch — Optional
Create vectors in PyTorch, compute dot products and cosine similarity, and build a neuron as a dot product.