Searching for Meaning (1 of 2): The Technology Behind Semantic Search

Behind the scenes of Generative AI’s (GenAI) emergence is another revolution less discussed.
The way information is searched has changed.
This two-article series is about the fundamentals behind semantic search, and a guide to bringing the conversation into your classroom. I took on improving AI’s search of my writing corpus recently. It made me realize semantic search is an underdiscussed aspect of modern AI, one that can be used in discussions of organizing and mining information with students.
At this point I have maybe a million words of stuff of various forms: books, articles, blogs, presentations, social media posts, and miscellaneous documents about my professional background. What I wanted was a way to search my prior content for relevant information within a Claude conversation (my most frequently used foundation AI model). NotebookLM can do the kind of search I want, but I wanted that available within a Claude chat.
I had been throwing all the material into a Google Drive folder and using Claude’s Google Drive integration and search (enabled in account settings), but I knew it was missing a lot of stuff and often giving me irrelevant content.
This wasn’t a mystery. Google Drive search is keyword-based. It looks for my words, not my meaning. You’ve experienced this a lot over the past few decades with search engines. Maybe you get what you want, but it seems mostly about choosing the right magic words, and it isn’t ever clear what you didn’t get. Over time they got better at understanding synonyms and judiciously expanding the breadth of search terms, and GenAI will pick the search terms for you, but requests like giving it a paragraph and asking what other paragraph this is like, in terms of meaning and topic, is not a strength of any form of keyword search.
For Claude to find documents semantically similar to my query—based on meaning rather than exact word matches—every file would need to be converted to a mathematical representation and stored in a specialized database.
So I built a RAG (Retrieval Augmented Generation) system for my writing content. This first article talks about the technology behind it. The second will focus on key design decisions and discussions with students.
The difference in the quality of search output from Claude’s Google Drive search and a semantic search enabled by the new database I built is demonstrated below.


Embedding: Representing Text by Meaning
Computers only process numbers. They can’t “read” text any more than a calculator can. Every system that works with language must first convert text into numerical form.
Traditional encoding assigns arbitrary numbers. The letter “a” might be 97, “b” is 98. “Cat” becomes [99, 97, 116]. But those numbers contain no information about what “cat” means—there’s no mathematical relationship between those numbers and the numbers for “kitten” or “pet” or “furry animal that knocks things off tables.”
Keyword search builds on this foundation: does this exact string of characters appear? Techniques like stemming (treating “running” and “run” as equivalent) help at the margins. But the fundamental approach remains matching characters, not meaning. A search for “student struggles with revision” won’t find a document about “learners having difficulty incorporating feedback”—even though they’re discussing the same phenomenon. Note that behind keyword search is often a sizable expansion of the search terms based on how words relate, including in ways that shape meaning, but those are still coarse semantic expansions. Subtle differences between longer passages isn’t likely to be captured.
Semantic search runs on numerical representation that tries to preserve meaning. “Student struggles with revision” and “learners having difficulty incorporating feedback” end up mathematically “close” to one another despite having no words in common. This process is called “embedding.”
Think of it like GPS coordinates. Every location on Earth gets coordinates, and nearby places have similar coordinates. Providence and Boston have more similar coordinates than Providence and Tokyo. An embedding model does the same thing for meaning—it assigns coordinates in “meaning space” to chunks of text. Similar ideas end up near each other.
When you use an embedding model you take a chunk of text—maybe a paragraph, maybe 500 words. You feed the whole chunk into the embedding model. The model outputs a list of numbers (1,536 numbers in my system) that represents where that chunk lives in meaning space. One chunk in, one set of coordinates out. The same chunk always produces the same coordinates.
A bunch of text run through an embedding model can then become part of a vector database, where “vector” is the set of numbers representing some text and its associated meanings. When you upload a long document for AI to use it runs the text through an embedding model to convert it to numbers.
The embedding model is an AI neural network, and the hard work happened when it was trained. During training, the model processed billions of sentences and learned which patterns of language tend to go together. Words that appear in similar contexts—”cat” and “dog” both follow “fed the,” both precede “chased a squirrel”—get treated as related. The model learned that certain combinations of words and phrases signal certain kinds of meaning.
The embedding model was trained differently than the AI you chat with. ChatGPT, Claude, etc. first learn by predicting the next word. Embedding models are typically trained for contrastive learning. They are trained both on texts that should have meaningful relationships (e.g., questions and answers, a query and a relevant document) and pairs of text that shouldn’t be similar (e.g. random pairings). The neural network is trained to push similar pairs closer to one another numerically and dissimilar pairs further apart.
Once trained, it becomes a machine: text in, coordinates out. When you give it a paragraph about a river bank, it reads the whole thing—”river,” “water,” “eroded”—and outputs coordinates reflecting that meaning. When you give it a paragraph about a financial bank, it reads that context—”loan,” “interest,” “deposit”—and outputs different coordinates. The word “bank” doesn’t get its own separate embedding. The whole chunk gets one embedding, and the surrounding words determine where it lands.
Vector Search: Aligning Meaning by Geometry
Once your documents are embedded—converted to coordinates in meaning space—search becomes geometry.
Your search query gets embedded using the same model. Now you have two sets of coordinates: one for your query, one for each chunk in your database.
The standard way to measure which chunks are closest to your query is by the angle between them. If your query and a chunk point in almost the same direction in meaning space (where the pointing begins at the origin), the angle between them is small—they’re about similar things. If they point in very different directions, the angle is large—unrelated topics. This is called cosine similarity, and it’s just the cosine function from trigonometry applied to meaning. Your high school math teacher would be delighted: search reduced to measuring angles.
You may wonder why it’s just the angle that matters. If you remember from your math classes, a vector has a direction and a magnitude. The magnitude of the embedding vector has more to do with the length of the original text than the meaning of it. It’s the angle that matters.
Embedding models can only handle so much text at once. The model I use maxes out around 6,000 words in a chunk. That’s fine for a blog post, but my books run 60,000 words or so. You can’t embed an entire book as one chunk.
Even if you could, you wouldn’t want to. A query like “What specific technique does Tim recommend for giving feedback?” needs to find a specific passage—not return an entire book and hope the user locates the relevant page. Plus, the odds that entire book is about that topic isn’t good.
So documents must be divided into chunks. And how you chunk shapes what’s findable. Cut in the wrong place and you sever an idea mid-thought. Make chunks too small and they lose context. Too large and specific insights get buried. You’re essentially deciding the granularity of meaning. These design decisions determine whether a retrieval system actually works.
One more constraint worth understanding: different embedding models create incompatible coordinate systems. A chunk embedded with OpenAI’s model gets completely different numbers than the same chunk embedded with Google’s model. The coordinates only have meaning relative to other coordinates from the same model. Choosing an embedding model is a commitment—like choosing a map projection. You can’t navigate using coordinates from a different system.
Semantic search is elegant. Text becomes coordinates, similar meanings land near each other, search becomes measuring angles.
But building a working system requires more than understanding the mechanism. You need to make decisions: How do you divide documents into searchable chunks? Does existing structure matter? What do you do with content that’s too long? How do you balance finding everything relevant against avoiding irrelevant noise?
These turn out to be the same questions librarians face when organizing collections, curriculum designers face when structuring courses, and textbook authors face when deciding what goes in each chapter. The next article walks through those decisions using my project as a case study and discusses how students might be engaged in the topic.
©2025 Dasey Consulting LLC



Fascinating piece your critique of vector databases and the “curse of dimensionality” in preserving true semantic meaning hits right at the heart of one of the real challenges with retrieval systems today. The way embeddings flatten nuanced relationships into cosine similarity is more than a little bit like forcing a whole library into a single shelf, losing the depth and rich interconnections that make knowledge meaningful.
I’ve been following your posts for a while (and commenting here and there), and this one feels like it was written for the project I’m deeply involved in. We’re building cathedral-grade datasets for reasoning-focused AI—distilling “wisdom” (reusable mechanisms, laws, verifiers, and failure modes) from wildly diverse sources: SOTA research papers (NeurIPS, arXiv), literary classics (Gutenberg), pure GitHub repos, and massive HF libraries like FineWeb-Edu.
The key difference from standard RAG pipelines: We don’t just chunk and embed raw text. We use multi-pass extraction lenses to pull eternal insights first (with posterior confidence gates ≥0.90 to filter noise/hallucinations), then store those distilled nodes in FAISS for retrieval. The goal is to preserve meaning upstream—before the vector space compresses it—so the embeddings carry deeper, more transferable reasoning chains rather than surface patterns.
Your discussion of information loss in high-D spaces resonates hard; we’ve seen similar eddies in cross-domain retrieval. I’d love to hear your thoughts on whether pre-distillation like this mitigates some of those issues.
Haha and Full disclosure buddy - If you’re ever curious to experiment with the data, I’d share packs in a heartbeat! Seriously it would be an honor to collaborate.
Looking forward to part 2.
Best, John Holman Awakened Intelligence
Tim - have you taken advantage of Claude Skills? It strikes me that "semantic search" would be a great skill you could create that gets invoked anytime you want to perform this function. I've had really good results dedicated to creating Projects within which I house all relevant chats related to a particular task which utilizes a "skill." One major difference I love between custom GPT's and Skills is you can use multiple "skills" within the same chat, pulling in a variety of specialized abilities at any given time. Worth looking into.