In part 1 I described scatter plots, while part 2 discussed optimization terrains and confusion matrices. These three visuals help us understand data relationships, learning processes, and performance evaluation in AI systems.
In this final article on AI visualizations, I discuss associative networks and the fascinating glimpses we can get into an AI's decision-making process. Associative networks allow us to map the intricate connections between human and AI ideas and information, between artificial neurons in machine learning AI, and between social networks, including future multi-AI systems. AI internal visualizations such as attention heatmaps and network visualizations offer windows into how AI systems process information, providing insights into their 'thought' processes.
These visualizations not only deepen our understanding of AI but also provide valuable parallels to human cognition and learning.
4. Associative Networks: Mapping Connections in Knowledge and Cognition
Associative networks are powerful visual tools that represent systems of interconnected ideas, concepts, or entities. These networks, which include knowledge graphs, neural networks, concept maps, and semantic webs, offer a way to visualize and understand the intricate relationships that underpin both human knowledge and artificial intelligence systems. Many educators already use this form of visualization for teaching other notions, whether it be the interconnectedness of storylines in a novel, the relationships in ecological systems, or skills in curricula. Special forms of associative networks include hierarchies and flow charts.
Such networks are usually shown as an interconnected system, where each point or node represents a concept or piece of information, and the lines or edges between them represent different forms or strengths of relationships. (Note: math people often confusingly call these associative networks “graphs” and the math associated with them “graph theory”, but neither refers to x-y plots.)
The versatility of associative networks makes them invaluable in various AI applications:
Knowledge Graphs: These are used by search engines and AI assistants to understand context and relationships in information. They help in answering complex queries by connecting disparate pieces of knowledge. For example, biological interaction networks are important in modern drug discovery.
Neural Networks: Neural networks, whether biological or artificial, are essentially complex associative networks where nodes represent neurons and edges represent synaptic connections.
Social Network Analysis: AI uses network theory to analyze social media connections, influence patterns, and information spread.
Recommendation Systems: E-commerce and streaming platforms use associative networks to map relationships between products, users, and preferences.
The power of associative networks lies in their ability to represent complex relationships in an intuitive visual format. They bridge the gap between the interconnected nature of human thought and the structured data processing of computers.
For students, engaging with associative networks can foster systems thinking and the ability to see connections across different areas of study. It encourages a more holistic view of knowledge, where facts aren't isolated pieces of information but part of a larger, interconnected whole.
In the context of AI literacy, understanding associative networks is crucial. They form the backbone of many AI systems, from the way search engines understand queries to how language models generate coherent text. By visualizing these networks, or simplified version of them, we can better grasp how AI systems "think" and make connections.
Associative networks also have properties that are useful for thinking and learning about:
Search and discovery processes: Associative networks offer insights into how information is found and vetted. Breadth-first search (BFS) and depth-first search (DFS) are fundamental strategies. BFS explores all neighboring nodes at the present depth before moving deeper, ideal for finding shortest paths between concepts or discovering diverse, related ideas quickly. This mirrors interdisciplinary exploration in research. DFS, on the other hand, explores as far as possible along each branch before backtracking, useful for exploring complex, hierarchical relationships or finding specific, targeted information. This aligns with specialized, in-depth study. Understanding these search strategies helps in developing more effective methods for information gathering and critical analysis across various fields.
Information/concept importance: In associative networks, certain nodes and edges hold greater significance due to their strategic positions or connectivity.
Hub nodes are highly connected, serving as central points in the network. They often represent key concepts or influential entities. Hub nodes allow quick access to a wide range of related concepts. In knowledge networks, hubs might be foundational theories, pivotal historical events, or authoritative sources.
Bridge edges are connections that link different clusters or communities within the network. They represent crucial associations that connect disparate areas of knowledge or influence. Bridge edges enable the connection of seemingly unrelated ideas, fostering innovation and interdisciplinary insights.
The importance of these elements lies in their ability to facilitate efficient information flow and access, and the notions are broadly useful. For example, in experimental design, identifying potential hub nodes can guide researchers to areas likely to yield significant insights, and discovering new bridge edges can lead to breakthrough connections between previously separate fields.
Common types of networks and their properties: Visualizations of associative networks illuminate properties of specialized network types.
Scale-free networks, common in natural and social systems, exhibit a connectivity distribution where a few nodes have many connections and most have few. This property explains phenomena like the "rich get richer" effect in social networks or the robustness of biological systems.
Small-world networks, characterized by tightly connected information clusters, are evident in social networks and neural systems. Visualizing these networks helps students grasp concepts like "six degrees of separation" in social science or information propagation in communication studies.
The process of describing rules and structural properties of these networks enhances students' understanding of information representation. For instance, exploring network topology aids in comprehending ontologies in computer science and information science, where hierarchical and associative relationships between concepts are crucial. This approach also facilitates understanding of taxonomy in biology, thesauri in linguistics, and conceptual frameworks in various academic disciplines. By engaging with these network visualizations and properties, students develop a more nuanced understanding of how information is structured, interconnected, and accessed across different domains of knowledge.
Associative networks are fundamental to information management and interpretation, serving as powerful tools for organizing, analyzing, and visualizing complex relationships across diverse fields. By representing knowledge as interconnected nodes and edges, these networks enable us to uncover hidden patterns, trace the flow of information, and make meaningful connections between seemingly disparate concepts.
5. Windows into AI's "Mind"
I’m cheating when it comes to the last visualization in my list. It’s really a suite of them, and the understanding of what complex AI is doing (called “AI interpretability”) is advancing quickly, with creative visualization approaches likely to follow. Since complex AI is built using neural networks that are a form of associative network, there is overlap between this visualization suite and associative networks.
One of the frustrations in using AI is that it’s difficult to understand what it’s doing. It’s critical that we get better at deciphering AI internals because it should lead to better control of the AI so that its answers align with user and society objectives. For students, views under the hood illuminate how complex interpretation of information is built upon layers of simpler interpretations – for people as well as machines.
Large-Language Models (which now are audio, image, video, code, etc., not just human language text) currently use the Generative Pre-trained Transformer (GPT) architecture composed of two Deep Neural Networks (DNNs). (Don’t worry, the terminology isn’t critical). DNNs are a family of hierarchical information processing networks that can be used for a huge range of functions. There are many DNN variants, but the principles are the same. The neurons are arranged in hierarchical layers of increasing information pattern complexity. At the lowest layer are the most fundamental information pieces, and the highest layer has the network’s answer. In between are information patterns of increasing complexity.
Unfortunately, explicitly showing hierarchies of language processing is very tricky because the network processes text as number patterns. Some of the learned features are quite complex patterns in language. However, the concept can be shown visually for image processing tasks, as shown below. Features from layers of audio analysis can also be instructive.
In classic GPTs, there are two DNNs:
A. Encoder Network
The encoder network's job is to read and understand the text or other information you give the AI when you use it. It takes the words you provide and converts them into a format the model can work with. Imagine the encoder as a sophisticated translator that turns your words into a series of numbers (called embeddings). These embeddings capture the meaning and context of each word based on the surrounding words in the sentence. The encoder knows nothing about what the LLM outputs. It is only analyzing the words you give and how they relate to one another.
A somewhat visual aspect of the encoder network is an attention heatmap, as shown below. It tries to visually tag the strength of connection between various aspects of information you provide.
The attention mechanism allows the encoder to focus on the most relevant parts of the input text when processing each word. Instead of treating all words equally, attention helps the model weigh the importance of different words based on their context. For example, in the sentence "The cat sat on the mat," attention helps the model understand that "cat" and "mat" are more closely related than "cat" and "the."
[Note: There is increasing opaqueness as to whether LLMs like ChatGPT have a separate encoder DNN or use only the DNN I’ll describe next (the decoder) and have it chew on your input text too. Regardless, there must still be an encoder as a conceptual element.]
Attention heatmaps, while crucial for understanding AI, also offer valuable insights across various educational domains. In language studies, these visualizations can illustrate how native speakers focus on different parts of sentences, aiding in the teaching of grammar and syntax. For history classes, attention heatmaps applied to historical documents can reveal key phrases or concepts that carry significant weight, helping students understand the nuances of primary sources. In psychology, these heatmaps can be used to demonstrate how human attention shifts during reading or visual processing, providing a tangible way to discuss cognitive processes. For media studies, attention heatmaps can show how viewers engage with different elements of advertisements or film scenes, offering insights into effective communication strategies. By incorporating these visualizations, educators can make abstract concepts more concrete, fostering deeper understanding and critical thinking skills across a wide range of subjects.
B. Decoder Network
The decoder network's job is to take the processed information from the encoder and generate a response, while also considering its own previous outputs. It's like having a skilled writer who uses the context and meaning provided by the encoder to construct coherent and relevant sentences.
In the decoder, attention is used to look back at the encoder's output when generating each word of the response. This helps the decoder focus on the most relevant parts of the input text, ensuring that the generated response is contextually accurate. The decoder also looks at its own output, and an attention heatmap on that output can be visualized as well as that of the encoder input.
As for what the decoder is doing, it’s doing something similar to the face analysis I showed earlier, but with orders of magnitude more layers in the hierarchy, and with complex language features that are hard to visualize. If smaller versions are used for instruction purposes, then the decoder will produce a lot of gibberish that will obscure the teaching point, but the true LLM decoder network scales are daunting, approaching the connectivity scale of a brain’s cognitive processing. Researchers aren’t yet able to reveal all of the gory detail in language-explainable ways.
However, there is rapid progress, most notably through a recent publication from Anthropic (makers of the Claude LLM) that produced visualizations like the one below.
Network visualizations are particularly useful to illustrate how AI processes visual information, drawing parallels to human processing. In art classes, network visualization can reveal how both humans and AI perceive paintings, helping students understand composition and visual impact. In biology class, the progressively abstract visual concepts of image processing AI can be compared to the human visual system.
By understanding both techniques, students gain a more comprehensive view of AI systems. They learn not just where AI focuses (attention heatmaps) but also how it builds up its understanding (network visualizations).
As we navigate the increasingly AI-driven world, the ability to visualize and understand complex data and AI processes becomes not just a technical skill, but a fundamental aspect of modern literacy. The five key visualizations we've explored – scatter plots, optimization terrains, confusion matrices, associative networks, and attention heatmaps/network visualizations – offer powerful tools for comprehending the intricate workings of AI systems and their parallels in human cognition.
These visualizations serve as bridges between abstract concepts and tangible understanding, allowing students, educators, and professionals alike to grasp the principles that underpin AI technologies. They illuminate the challenges and complexities involved in AI development, from data representation and decision-making to performance evaluation and knowledge structuring.
Moreover, these visualizations are useful beyond the realm of AI. They provide frameworks for critical thinking, problem-solving, and system analysis that are applicable across numerous disciplines and real-world scenarios. By incorporating them into education at various levels, we equip learners with the mental models necessary to navigate and shape our AI-infused future.
Thanks for this incredibly well written and enlightening introduction to the anatomy of AI. Got lost in Wolfram, will study this intro first.