A talk by Ryan St. Pierre // BSW 2026

My Covariance Matrix
is Better Than
Your Shit-Tier LLM

Tales from the Edge of HS Math

format30 min · live demo · Q&A

audiencefounders · builders · the back row

warningcontains opinions

serious ML engineers
are gonna have notes.
that's fine.

cold open

A few years ago I went to the most technical ML talk at BSW.

The speaker asked who in the room was technical.

✋

·

about 20% of the hands went up.

For a research town like Boulder, that's a fucking bad sign.

who's talking

I'm not an ML researcher. I run a couple of rooms.

Boulder New Tech (took it over a year ago, it's been around 17)
Builders' Room at BSW (3 years and counting)
Threshold Labs (geometric stuff in neural nets)
Have lived here ~12 years and watched some things

This talk is the one I've been punting on for three years. It's the one I want to give.

the thesis

When did we all forget that the most powerful tools we have are the ones we learned in high school?

the claim

For a lot of real problems, simple math beats your LLM.

Cheaper
Faster
Inspectable
Updates in real time when new data shows up
Doesn't need a vendor

// caveat

This is not "LLMs are bad." LLMs do things a covariance matrix can't. But we've been so dazzled we stopped asking whether the problem has structure we could exploit directly.

a friendly witness

Even Karpathy thinks LLMs are too damn big.

Recent work: microgpt (4,192 params) and nanochat (a working ChatGPT clone for under $100).

// the actual argument

"LLM size competition is intensifying, backwards. Models are big because we're asking them to memorize the internet, not because reasoning needs that scale."

Translation: most problems don't need that much model. They need the right structure.

the toolkit

Here's the thing nobody tells you. These techniques are all the same family.

Every one of them is asking a slightly different question about the same matrix. Once you see the family, the buzzwords stop being scary. They become moves you make depending on the question you have.

// none of this is new. most of it predates the iPhone.

five questions, one matrix

What do we actually want to know?

// 01covariance

"What goes with what?" The map of relationships. Everything below uses this as its base.

// 02k-means

"Which points belong together?" Cluster the space into groups. Cheap, interpretable, fails on weird shapes.

// 03UMAP

"What does this space look like in 2D?" Make the geometry visible. Preserves local + some global structure.

// 04MMR

"Pick a diverse set of relevant things." Relevance penalized by redundancy. Better recommendations, less echo.

// 05isolation forest

"Which points don't belong to any cluster?" The interesting outliers. Anomaly detection in O(n log n).

when to reach for what

Each one wins when you ask a specific question.

// covariance

Covariance / co-occurrence

"What's related to what?"

USE WHEN: Tagging, classification, building term graphs, finding sub-fields inside a corpus.

DON'T: When relationships aren't linear. Then reach for kernels or embeddings.

// k-means

K-Means

"Group them into k buckets."

USE WHEN: You roughly know how many groups. Customer segments, persona splits, quick first cuts.

DON'T: When clusters are non-convex or wildly different sizes. K-means assumes spheres.

// UMAP

UMAP

"Show me the space."

USE WHEN: You need to see structure. Exploratory, visual, finding clusters you didn't know existed.

DON'T: As a feature pipeline for downstream models. It's stochastic. Use it to look, not to feed.

// MMR

Maximal Marginal Relevance

"Relevant + diverse."

USE WHEN: Recommending, retrieving, summarizing. Anywhere the top-k looks samey and bores users.

DON'T: When you genuinely want top-k by raw score with hard quality cutoffs.

// isolation forest

Isolation Forest

"Find the weirdos."

USE WHEN: Fraud, outliers, novelty detection, "this attendee is unlike anyone else here" interdisciplinary signal.

DON'T: When data is high-dim and dense. Distance-based methods drown in dimensions.

// LLM

Large Language Model

"Generate language."

USE WHEN: Open-ended generation, instruction following, language tasks where the answer is itself a sentence.

DON'T: As a wrapper around problems with structure. You're paying for memorization to do math.

the secret

You don't pick one. You stack them.

# a real pipeline that beats most LLM wrappers $ M # co-occurrence matrix from your corpus $ C = kmeans(M, k=8) # initial groups $ U = umap(M) # 2D projection to see $ outl = isolation_forest(M) # the interesting weirdos $ recs = mmr(query, M, λ=0.7) # diverse recs from same space # now use the LLM for the ONE thing it's good at: $ label = llm(f"name this cluster: {top_terms(C[i])}")

The LLM does one sentence of work. The math does everything else.

demo · part one

Classify a community by skills and interests.

Input: bios from a real, regional tech community you probably care about.
Goal: figure out what people do, what they're into, and who clusters with whom.

# step 1: load bios $ bios = load(corpus) → N bios loaded # step 2: covariance over terms $ M = covariance(tokenize(bios)) → matrix shape: (V × V) # step 3: tag a single bio $ tag(bio, M) → {"engineering": 0.82, "design": 0.31, ...}

// no API key. no GPU. no vendor.

co-occurrence matrix

PPMI co-occurrence heatmap — top 10 terms

PPMI-normalized co-occurrence. Each cell = how strongly two terms travel together across bios.

the live moment

Someone in the audience submits a bio. The matrix updates. Live.

No retraining. No fine-tuning run. No "please wait while we redeploy."

You can watch the geometry of the room change as new data arrives.

// why this matters

An LLM classification is frozen at training time. It doesn't know about the person who registered yesterday. The matrix does. By design.

demo · still part one

Now I add the rest of the family. Same data. Five questions.

K-means on the matrix → personas emerge from the data. No taxonomy needed up front.
UMAP on the matrix → 2D map of the room you can look at. Show clusters as points.
Isolation Forest → interdisciplinary outliers. The people who don't fit any cluster cleanly are usually the most interesting in the room.
MMR on the matrix → "give me 5 people relevant to AI but diverse from each other" — useful for panel curation.

All of this on a laptop. None of it called an LLM.

the room in 2D

UMAP projection colored by k-means cluster

UMAP projection of bio vectors. Colors = k-means clusters. Structure emerges from term co-occurrence alone.

the weirdos

UMAP with isolation forest anomalies highlighted

Isolation forest anomalies (red). The interdisciplinary outliers who don't fit any cluster cleanly.

part two

So how do you actually vibe code with this in mind?

the trap

The default failure mode of vibe coding is the LLM wrapper.

You ask a coding agent to solve a problem. It reaches for the most legible solution: call an LLM. Wrap a function around it. Return JSON. Ship.

It works. It demos great. And then:

You can't update without re-prompting
You can't see why it made a decision
It costs money every time it runs
It hallucinates and you don't notice for weeks
The shape of your data doesn't matter to the code

// the real cost

You've outsourced not just the computation but the understanding of your problem to a vendor. You can't reason about your own product anymore.

the posture

The fix isn't to stop using assistants. It's to stay in the math seat.

Coding assistants are extraordinary at typing. They are mediocre at deciding what should be typed. That's your job.

If you walk in with the geometry of your problem clear in your head, the assistant becomes a power tool. If you walk in with vibes, it becomes a slot machine.

four rules for vibe coding without losing the plot

How to keep the math front and center.

1

Name the shape of your data first Before any prompt: what's the input? The output? The dimensionality? The volume? What changes over time? If you can't answer in one breath, you're not ready to prompt yet.

2

Specify the technique, not just the task "Classify these" gives you an LLM wrapper. "Build a co-occurrence matrix and cluster with k-means" gives you a real pipeline. The technique is the prompt.

3

Forbid the LLM call by default Add it explicitly. "No LLM calls inside the classification function. If you need labels, return cluster IDs and we'll label separately." This single rule prevents 80% of the slop.

4

Demand inspectability "Every output must include the matrix, the cluster centers, and the distances. I want to be able to debug a single bad classification by looking at numbers." If you can't read the system, you don't own the system.

show your work

Same task. Two prompts. Watch what changes.

// prompt A · the slop default

Here's a CSV of bios. Classify each by skills and interests. Output JSON.

// prompt B · math in the room

Here's a CSV of bios. Build the following pipeline: 1. Tokenize and build a term co-occurrence matrix M. 2. Cluster terms with k-means (try k=4..10, pick by silhouette). 3. For each bio, score against each cluster centroid → soft tags. 4. Run isolation forest to flag interdisciplinary outliers. 5. Project with UMAP to a 2D map for inspection. NO LLM calls inside this pipeline. Output: matrix, cluster centers, per-bio soft tags, outlier flags, 2D coords. Code must support adding a new bio without re-running clustering.

Prompt B doesn't need an LLM at runtime. It needed one in your head.

honest numbers · honest room

While we're here. Let's look at the technical/non-technical split in our own data.

39.4%

self-described technical, by their own words, in their own bios.

I'm not editorializing. I'm showing you what the matrix found.

// the unsupervised approach didn't pre-decide what "technical" means.
// the clusters emerged from the language people used about themselves.

homework

Where to go from here.

// stack

The starter kit

Python · numpy · scikit-learn · UMAP-learn · transformers.js for poking at small models locally · ONNX runtime when you want to look under the hood.

// reading

To go deeper

Karpathy's nanochat. Christopher Olah's older Distill posts on dimensionality. Anything by Leland McInnes (UMAP author) on geometry of high-dim data.

// next

If you liked covariance

PCA → kernel PCA → spectral clustering → graph Laplacians → diffusion maps. Same family. Each one a slightly different question about the same matrix.

the actual point

Right tool. Right problem. Know what your tool is doing.

one last thing

Boulder has the ingredients. The researchers. The builders. The weirdos.

The point of this talk wasn't to dunk on LLMs. It was to make a room where the PhD from CU and the non-technical founder can both feel like the conversation is worth their time.

You don't need to be an expert to ask better questions. You just need to remember the math you already know and use it to connect with the people who know more.

Thanks. Now: questions, fights, beer.
— ryan / threshold labs / boulder new tech

connect

Find me. Build something.

Threshold Labs

getthreshold.com

linkedin.com/in/ryanastpierre

These Slides

bsw-decks.pages.dev

My Covariance Matrix is Better Than Your Shit-Tier LLM