Searching for Meaning (2 of 2): The Design Choices That Shape What's Findable

Article 1 (“The Technology Behind Semantic Search”) explained how searching-by-meaning works. Text becomes coordinates in meaning space, similar meanings get similar numbers, and search becomes measuring angles between vectors.
But there’s a catch. Documents must be divided into pieces before they can be embedded. How you divide them shapes what’s findable.
The instinct is to start with the content. What do I have? How long are the documents? What’s their structure? Those questions matter, but they’re not first. The first question is: what do I need this system to do?
The example I’ll use is the retrieval system I built for my own writing. At this point I’ve written a lot. Two books and a third on the way. I think about ninety articles and thousands of social media posts. The decisions I made during the design were driven by two things I needed the system to do. First, find specific things I know I wrote but can’t remember where. Second, survey what I’ve written when a topic spans my corpus and see the landscape, not a pile of fragments.
This article takes you through the design decisions and discusses how students might learn about meaning and its various organizing levels.
Finding Specific Things
I know I’ve written about AI systems deceiving humans. I can’t remember where. I search “AI that can deceive humans” and the system should surface a passage about Meta’s CICERO—an AI that played the game Diplomacy, ranked in the top ten percent of players, and whose human opponents didn’t realize it wasn’t human. That’s a specific example living in a few hundred words somewhere in my book Wisdom Factories. That I knew.
This need demands “meaning” come in small chunks. I know the CICERO story is only a couple of paragraphs long. If my chunks are too large—say, entire chapters—then the CICERO example is buried in 5,000 words of surrounding text. Maybe the “AI that can deceive humans” meaning isn’t what the rest of the chapter is about, so that the query never surfaces CICERO. If chunks are too small, like each paragraph, then I get a fragment without enough context. The query misses the mark there too.
I chose 500-word chunks. Big enough to carry meaning, small enough that a specific example doesn’t drown.
However, use case matters a lot. If I were building this for readers exploring my work—people who don’t know what’s in it—different chunk sizes might make sense. Larger chunks would give them more context when they stumble onto unfamiliar ideas. They’re discovering; I’m retrieving. Discovery tolerates more surrounding text. Retrieval needs precision.
Surveying What I’ve Written
The harder problem is thematic queries. I search “AI’s impact on work” and I’ve written about this in Wisdom Factories, in blogs, in sections of AI Wisdom Volume 1 and (soon to come) Volume 2, and in many posts. Without structure, the system dumps 40 chunks on me. I’m staring at fragments, trying to mentally reconstruct what I’ve said across different pieces. There’s no guarantee that the meaning of each chunk aggregates into the meaning of an entire chapter, much less an entire book.
What I need is a landscape view: here’s a book where this is a central theme (and which chapters address it), here are three blog posts that touch on it, here’s a section of AI Wisdom that’s relevant. I need to see “I wrote a whole chapter on this” versus “I mentioned it once.”
Plus, sometimes I know something is in a social media post and not a book, or I know which book or article to search.
Those require hierarchy, so my database has multiple levels:
Books: book → chapter → section → chunks
Articles and blogs: article → section → chunks
Social media posts: post → chunks directly
Each level has its own embedded meaning. When I query “AI’s impact on work,” the system can surface matches at the book level (Wisdom Factories as a coherent argument), chapter level (Chapter 2: “Humanity’s Wisdom Role”), section level, and chunk level. I see the structure of what I’ve written, not just pieces.
Again, use case drives this. If my corpus were someone else’s research library—hundreds of papers they haven’t read—flat chunking might work fine. They’re not trying to see the landscape of their own thought. They’re mining for relevant passages. But I wrote this stuff. I need to know the shape of what I’ve already said so I don’t repeat myself or miss connections I’ve already made.
Finding the Seams
The hierarchy only works if the system knows where chapters and sections actually begin. A lot of the work in putting the database together was deciding how to ingest and break up the files.
One option is rules. “Heading 1 = chapter.” “Bold text followed by blank line = section.” This works until it doesn’t. My documents have inconsistent formatting—some use Word heading styles, some use bold text, some use special character separators. Rules become brittle.
I had Claude read each document and return structural boundaries as JSON. AI handles variation that would require dozens of rules.
There’s a trade-off. Rules are predictable but break on edge cases. AI is flexible but occasionally wrong. If structure detection misses a chapter break, the hierarchy gets corrupted. A query about “evidence that experts are poor forecasters” should surface my writings about Phil Tetlock’s work wherever it lives—but if the system misidentified where that section begins, I could get a fragment that starts mid-argument.
A cleaner corpus might favor rules. Academic papers with rigid structure—Abstract, Methods, Results—could be parsed reliably with simple patterns. My messy collection of blogs and books couldn’t.
When Documents Won’t Fit
Embedding models have token limits—roughly 6,000 words for the model I used. Plenty for articles. Not enough for a 40,000-word book. When I added hierarchy to the database I also added embeddings for sections, chapters, and books. A 6,000 word limit can capture most chapter meanings well. But the first few thousand words of a book don’t capture the book in all its nuance.
My options were to truncate and lose everything after page 20, or just avoid giving the “book” level of the hierarchy any embedded meaning. It would still be useful for “search my books for…” queries. I chose a third way. I had AI compress each chapter into a few hundred word summary, and then embed that series of summaries as the book-level representation. I chose summarization. It’s lossy—nuance disappears—but far better than pretending a book is only its first chapter.
This matters for survey queries. “Paradigm shift needed in schools” is a through line of the entire Wisdom Factories book. With summarization, the book surfaces as a unified argument. With truncation, only early themes appear. With no book-level embedding at all, I get chunks with no sense that they’re part of a larger, coherent case.
Conversations Are Different
Dialogues pose a distinct problem. In a back-and-forth, ideas emerge across speakers. One person asks, another answers, a third pushes back. Where’s the unit of meaning? A single speaker turn might be incomplete. The full exchange might mix multiple topics.
Conversational content might need chunking by topic shift rather than by speaker, or preserving question-answer pairs as atomic units. The right choice depends on how you’ll query it. If you want to find “what Tim said about X,” speaker-based chunking helps. If you want to find “the exchange where X got debated,” you need a different approach.
I didn’t solve this problem. My corpus is mostly authored documents, not transcripts. The key point is that the structure of the content and the needs for your queries jointly determine what design works.
Every decision traced back to those two needs: finding specific things, and surveying themes. Chunk size serves precision. Hierarchy serves landscape views. Structure detection enables hierarchy. Summarization keeps long documents visible for survey queries.
Different needs, different corpus, different choices. If I’d built this for readers who don’t know what’s inside, discovery would matter more than retrieval. Larger chunks, maybe less hierarchy. If my writing were uniformly structured—all blog posts, similar lengths—rules might beat AI for structure detection. If everything were short, summarization would be unnecessary.
The technical choices matter. But they’re downstream of a more fundamental question: what do you want to do with your data, and what will you be asking it?
Classroom Exercise: Students as Information Architects
Students design a retrieval system for a specific corpus—historical primary sources, scientific articles, a collection of class readings. But they start with use cases, not content:
Who will search this, and what will they need? Someone who knows the content surveying what’s there? Or someone discovering it for the first time?
What kinds of queries should work? Specific passages? Thematic overviews? Both?
What does that imply for structure? How much hierarchy? What chunk size?
Then: What’s in the corpus, and how does its existing structure support or complicate those needs?
The exercise doesn’t require building anything. A design document forces the same thinking. For students who want to build, free database tools exist. AI can write the code. The barrier isn’t technical. It’s knowing what you’re trying to build.
©2025 Dasey Consulting LLC



One of the most misleading things about AI and context windows is precisely what you are describing here. Many commentators make claims (as do the companies) that you can put entire books (say, a 300 page PDF) into something like NotebookLM, and utilize that as a "source" from which to generate content. Unfortunately, as you note, this doesn't work - other people have described exactly what you discuss here - the importance of chunking. I've found that anything more than a 10 -15 page dense selection of text will miss things. It's great for including shorter research articles, but anything more needs to be handled differently. I teach a research class and we've utilized NotebookLM but it really needs to be explained clearly to be effective. Lots of students are not maximizing the potential so there definitely needs to be more of this kind of explanation and training.
Excellent analysis of how document chunking shapes findabilty. What if the system itself could learn to dynamically adjust the meaning space granularity based on the user's evovling intent?