Five Key Visualizations for AI Literacy (1 of 3)

Part 1: Scatter Plots of Data Patterns and Relationships

Jul 14, 2024

Visualization skill is critical for deep competencies.

In mathematics, students often visualize number lines or geometric shapes to understand abstract concepts like negative numbers or trigonometry.

Musicians might visualize patterns on their instruments or sheet music to understand complex rhythms or harmonies.

Writers often create mental maps of their stories, visualizing the interconnections between characters, plot points, and themes.

Physicists use thought experiments with visual components, like Einstein's famous elevator experiment to understand gravity, or Schrödinger's cat to grasp quantum superposition.

Programmers frequently visualize the flow of data through their code, imagining it as a series of pipes or a flowchart.

Leaders often use mental models like Porter's Five Forces, which visualizes the competitive landscape to make sense of market dynamics.

Even in everyday problem-solving, we "picture" potential solutions or outcomes in our mind's eye. I get the text-in-my-head visualization often when considering how to communicate or understand something.

Despite the surface interaction with AI that is text-based, the most useful AI-related visualizations are data science ones. Don’t worry though – they are intuitive even without deep data science and AI skill.

In understanding AI, my brain uses just a few mental depictions repeatedly:

1. Scatter Plots: Data Patterns and Relationships

2. Optimization Terrains: Learning to be the Best

3. Confusion Matrices: Clarifying Performance

4. Associative Networks: Mapping Connections in Knowledge and Cognition

5. Windows into AI's "Mind"

These visuals are useful for a range of conceptual lessons. Even when the AI system's complexity precludes direct application of the visual, the abstract issues can usually be informed by simpler versions our minds can handle. For instance, when thinking about how a neural network learns, I often visualize a landscape with hills and valleys, where the AI is trying to find the lowest point (representing the best solution). This simple mental image helps me grasp the concept of optimization in AI, and its challenges, even though the optimization terrain for real AI is too complex to draw.

The kicker is these visualizations are useful for a range of everyday uses that have nothing to do with AI. AI tries to mimic cognition, so the ways to think about it are often synonymous with some of the conceptual issues in human cognition.

Here are five key types of visualizations that can enhance learning across various subjects and grade levels, while also providing insight into the world of AI.

1. Scatter Plots: Data Patterns and Relationships

AI is about finding patterns across many examples of information. For each example, there are a variety of different information pieces (a.k.a. features) about it. Each data example can be shown as a point in a scatter plot, where the axes represent the types of information. A scatter plot of many examples can illustrate the patterns in the data and the challenges in characterizing or making decisions about those patterns.

Scatter plots show up in education all the time. If students are exploring the differences between Lebron James and Michael Jordan, they might generate a scatter plot like the one below, showing points per game and assists per game for each season of their careers.

Scatter plot of Lebron and Michael’s season statistics for two measures. Each point is a different season.

This simple two-dimensional scatter plot introduces a powerful tool for visualizing data patterns and relationships, one that bridges human cognition and artificial intelligence. You can make a huge range of conceptual points about AI using a 2D scatter plot. AI and our brains process a lot more than two features—and Lebron and Michael should be measured by more than these two variables—but even then, a skilled AI person can anticipate the issues that might emerge because they can imagine the phenomenon in 2D or 3D.

This visualization illuminates key AI concepts:

Decision Boundaries: In the context of the LeBron vs. Jordan scatter plot, a decision boundary would be a line (or curve) that separates the data points representing LeBron's seasons from those representing Jordan's. This concept is crucial in AI classification tasks, where the system must learn to distinguish between different categories based on input features.
Generalization vs. Overfitting: The challenge of drawing the right boundary between Lebron and Michael clusters illustrates an AI dilemma and a judgment issue in general, whether human or machine. Should the AI create a simple division (generalizing) and sometimes be wrong, or a complex, winding boundary that perfectly separates every point (overfitting)?
Understanding Biases and Data Representativeness: Scatter plots can reveal biases in data collection or representation. For instance, if all data points for one player are clustered in one area of the plot, it might indicate a lack of diversity in the data or a bias in how it was collected. This visualization can help students understand the importance of having a representative dataset for AI training and the potential consequences of biased data.
Feature Importance: AI systems must determine which of their many input features are most crucial for making predictions or classifications. Points per game and assists per game may not be the best variables to illustrate the difference between Lebron and Michael, and that can be apparent from the scatter plot. AI often needs to reduce high-dimensional data (many axes) to lower dimensionality (few axes) for analysis or visualization, much like how we might project analysis of a complex problem onto a few key variables that matter most.
Anomaly Detection: An unusual season would visually stand out in the Lebron/Michael plot, AI systems use similar principles to identify anomalies in complex datasets.

These concepts parallel human cognition. Our brains also grapple with these issues, albeit less explicitly.

Whether in biology class analyzing ecosystems, history class examining social, economic, or cultural trends, or computer science introducing machine learning, scatter plots are an entry point to understanding the abstract world of complex data.

Part 2 - Optimization Terrains and Confusion Matrices

Part 3 - Associative Networks and AIs ‘Mind’

David Harper, CFA, FRM

Scatterplots are a great first visualization. I was curious and asked Claude to interpret and, despite my experience, it notices more than me:

"From this scatterplot comparing LeBron James and Michael Jordan's season averages for points per game and assists per game, we can draw several inferences:

1. Scoring ability:

- Both players consistently averaged over 20 points per game across seasons.

- Jordan (red points) generally had higher scoring averages, with many seasons above 30 points per game.

- LeBron (blue points) has a wider range of scoring averages, mostly between 25-30 points per game.

2. Playmaking:

- LeBron consistently averaged more assists per game than Jordan across seasons.

- Most of LeBron's seasons show 6-9 assists per game.

- Jordan's assist averages are generally lower, mostly in the 4-6 range.

3. Playing styles:

- This data suggests LeBron had a more balanced game between scoring and playmaking.

- Jordan's data points to a more scoring-focused style of play.

4. Versatility:

- LeBron's data points are more spread out, indicating more variation in his role from season to season.

- Jordan's data clusters more tightly, suggesting a more consistent role across seasons.

5. Career progression:

- For both players, there's no clear linear progression, indicating their playing styles and roles varied throughout their careers rather than following a simple trend.

6. Statistical outliers:

- LeBron has a couple of seasons with notably high assist averages (around 9 per game).

- Jordan has a few seasons with exceptionally high scoring averages (35+ points per game).

7. Overall production:

- Both players consistently produced at elite levels, with high combinations of points and assists across multiple seasons.

It's important to note that this data doesn't capture all aspects of their game (e.g., defense, rebounding) and doesn't account for factors like pace of play, team composition, or rule changes across eras. Nonetheless, it provides interesting insights into the statistical profiles of these two basketball legends." -- Claude 3.5 Sonnet

Expand full comment

Five Key Visualizations for AI Literacy (1 of 3)

Part 1: Scatter Plots of Data Patterns and Relationships

1. Scatter Plots: Data Patterns and Relationships

Discussion about this post