June 7, 2025

Understanding LLM Embeddings: The Hidden Language AI Speaks

A high-dimensional space of numbers — image generated with AI

How large language models turn words into numbers that capture meaning

How does ChatGPT understand that “dog” and “puppy” are related? How can Google find relevant results even when you don’t use the exact keywords? The secret lies in embeddings — the mathematical foundation that gives AI its understanding of language.

Think of embeddings as AI’s universal translator. They convert human language into numbers that computers can understand and work with. But these aren’t just random numbers — they’re carefully arranged to capture the essence of meaning itself.

The Problem That Broke Computers’ Brains

Picture trying to teach a computer the sentence “The king walked to his castle.”

Sounds simple, right? But computers only speak numbers. They don’t understand words like “king” or “castle.” For decades, this was like trying to teach someone who only speaks Mandarin using a book written in Egyptian hieroglyphics.

Early AI systems treated words as random symbols. To them, “cat” was just as similar to “king” as it was to “dog.” The word “king” might be represented as 1,247, while “queen” was 8,952. These numbers had no relationship whatsoever, even though we humans know kings and queens are closely related concepts.

This created a massive roadblock: How could computers ever truly understand language if they couldn’t grasp that some words are more similar than others?

What Are LLM Embeddings, Really?

Imagine organizing your music library not by artist name, but by sound. Similar-sounding songs sit close together — jazz near blues, soul nearby, while metal stays far from classical. That’s essentially what embeddings do for words and concepts.

LLM embeddings are vector representations of words, phrases, or entire texts. These vectors capture semantic meaning in high-dimensional space.

Each word becomes a list of numbers — typically hundreds or thousands of them — that work together to represent not just the word itself, but its meaning, context, and relationships.

Here’s what makes this remarkable: the word “bank” gets different embeddings in “river bank” and “bank account.” The AI understands context automatically.

The Magic Behind the Numbers

When an AI model processes text, it follows three key steps:

Step 1: Breaking Down Text

First, text gets chopped into smaller pieces called tokens — whole words, parts of words, or even individual characters. Think of it like breaking a sentence into LEGO blocks.

Step 2: Creating the Vector

Each token converts into its numerical representation — the embedding vector. These high-dimensional vectors contain typically 768 to 4096 numbers for modern models.

Step 3: Capturing Relationships

Here’s where magic happens. Semantically similar words cluster together in the embedding space. If you could visualize this high-dimensional space, you’d see “cat” sitting near “dog,” both close to “pet,” and all reasonably near “animal.”

Words as Points in Mathematical Space

From Theory to Revolution

For years, researchers treated words as isolated symbols. Then came a radical idea: what if words could be represented as points in space, where their positions reflected their meanings?

In 2003, Yoshua Bengio — one of the “Godfathers of AI” — coined the term “word embeddings” and developed neural language models that learned distributed representations of words. Yet these early embeddings remained computationally expensive and confined to academic circles.

The real breakthrough came a decade later. In 2013, Google’s Tomáš Mikolov and his team released Word2vec, transforming embeddings from academic curiosity into practical tool. The original paper faced initial rejection by reviewers, and it took months for the code to be approved for open-sourcing. Yet it would revolutionize natural language processing.

By analyzing massive amounts of text, Word2vec could automatically discover meaningful patterns. “King” clustered with “crown” and “throne,” while “dog” grouped with “bark” and “tail.” The system learned these relationships simply by observing which words appeared together across millions of sentences.

The Vector Arithmetic Miracle

The most significant discovery was that these word representations could perform algebraic operations that actually made sense:

King — Man + Woman = Queen

This wasn’t just a mathematical trick — it revealed something profound about how language works. The computer had discovered that “kingship” is a concept that can be separated from gender, and then reapplied to create “queen.”

Suddenly, computers could solve word analogies:

Paris is to France as _____ is to Italy? (Rome)
Swimming is to pool as _____ is to road? (Driving)

These weren’t programmed rules. The computer figured them out by understanding relationships between concepts. During training, Word2vec noticed that “Paris” and “France” appeared together in similar contexts to how “Rome” and “Italy” appeared together. It learned that the mathematical distance between country-capital pairs was consistent across different examples, allowing it to solve analogies by finding words with similar vector relationships.

The Evolution of Understanding

Early methods like Latent Semantic Analysis (LSA) and Singular Value Decomposition (SVD) treated each word as having a fixed meaning — no flexibility, no context. But language is messy and context-dependent. Modern LLM embeddings are more like having a brilliant linguist who understands nuance, sarcasm, cultural references, and context.

The journey from Word2vec to modern transformers with attention mechanisms represents one of the most remarkable progressions in the history of artificial intelligence. They don’t just create simple word embeddings; they generate different representations for the same word depending on its context. The word “bank” gets one representation when talking about money and a completely different one when discussing rivers.

Why This Matters

The discovery of embeddings solved one of the biggest puzzles in artificial intelligence: how to teach machines to understand meaning. By representing words, images, and concepts as points in mathematical space, researchers created a bridge between human understanding and computer processing.

This wasn’t just a technical breakthrough — it was a philosophical one. It showed us that meaning itself might be mathematical, that the relationships between concepts can be captured in numbers and equations. In teaching computers to understand us, we learned something new about the nature of understanding itself.

The next time you have a surprisingly good conversation with a chatbot or find exactly the right search results, appreciate the intricate mathematical dance of embeddings that made it possible. In those thousands of numbers lies something approaching digital understanding — and that’s pretty remarkable.

#ai #llms

Originally published on Medium.