A talk by Ryan St. Pierre // BSW 2026
My Covariance Matrix
is Better Than
Your Shit-Tier LLM
Tales from the Edge of HS Math
serious ML engineers
are gonna have notes.
that's fine.
cold open
A few years ago I went to the most technical ML talk at BSW.
The speaker asked who in the room was technical.
about 20% of the hands went up.
For a research town like Boulder, that's a fucking bad sign.
who's talking
I'm not an ML researcher. I run a couple of rooms.
- Boulder New Tech (took it over a year ago, it's been around 17)
- Builders' Room at BSW (3 years and counting)
- Threshold Labs (geometric stuff in neural nets)
- Have lived here ~12 years and watched some things
This talk is the one I've been punting on for three years. It's the one I want to give.
the thesis
When did we all forget that the most powerful tools we have are the ones we learned in high school?
the claim
For a lot of real problems, simple math beats your LLM.
- Cheaper
- Faster
- Inspectable
- Updates in real time when new data shows up
- Doesn't need a vendor
// caveat
This is not "LLMs are bad." LLMs do things a covariance matrix can't. But we've been so dazzled we stopped asking whether the problem has structure we could exploit directly.
a friendly witness
Even Karpathy thinks LLMs are too damn big.
Recent work: microgpt (4,192 params) and nanochat (a working ChatGPT clone for under $100).
// the actual argument
"LLM size competition is intensifying, backwards. Models are big because we're asking them to memorize the internet, not because reasoning needs that scale."
Translation: most problems don't need that much model. They need the right structure.
the toolkit
Here's the thing nobody tells you. These techniques are all the same family.
Every one of them is asking a slightly different question about the same matrix. Once you see the family, the buzzwords stop being scary. They become moves you make depending on the question you have.
// none of this is new. most of it predates the iPhone.
five questions, one matrix
What do we actually want to know?
// 01covariance
"What goes with what?" The map of relationships. Everything below uses this as its base.
// 02k-means
"Which points belong together?" Cluster the space into groups. Cheap, interpretable, fails on weird shapes.
// 03UMAP
"What does this space look like in 2D?" Make the geometry visible. Preserves local + some global structure.
// 04MMR
"Pick a diverse set of relevant things." Relevance penalized by redundancy. Better recommendations, less echo.
// 05isolation forest
"Which points don't belong to any cluster?" The interesting outliers. Anomaly detection in O(n log n).
when to reach for what
Each one wins when you ask a specific question.
// covariance
Covariance / co-occurrence
"What's related to what?"
USE WHEN: Tagging, classification, building term graphs, finding sub-fields inside a corpus.
DON'T: When relationships aren't linear. Then reach for kernels or embeddings.
// k-means
K-Means
"Group them into k buckets."
USE WHEN: You roughly know how many groups. Customer segments, persona splits, quick first cuts.
DON'T: When clusters are non-convex or wildly different sizes. K-means assumes spheres.
// UMAP
UMAP
"Show me the space."
USE WHEN: You need to see structure. Exploratory, visual, finding clusters you didn't know existed.
DON'T: As a feature pipeline for downstream models. It's stochastic. Use it to look, not to feed.
// MMR
Maximal Marginal Relevance
"Relevant + diverse."
USE WHEN: Recommending, retrieving, summarizing. Anywhere the top-k looks samey and bores users.
DON'T: When you genuinely want top-k by raw score with hard quality cutoffs.
// isolation forest
Isolation Forest
"Find the weirdos."
USE WHEN: Fraud, outliers, novelty detection, "this attendee is unlike anyone else here" interdisciplinary signal.
DON'T: When data is high-dim and dense. Distance-based methods drown in dimensions.
// LLM
Large Language Model
"Generate language."
USE WHEN: Open-ended generation, instruction following, language tasks where the answer is itself a sentence.
DON'T: As a wrapper around problems with structure. You're paying for memorization to do math.
the secret
You don't pick one. You stack them.
$ M
$ C = kmeans(M, k=8)
$ U = umap(M)
$ outl = isolation_forest(M)
$ recs = mmr(query, M, λ=0.7)
$ label = llm(f"name this cluster: {top_terms(C[i])}")
The LLM does one sentence of work. The math does everything else.
demo · part one
Classify a community by skills and interests.
Input: bios from a real, regional tech community you probably care about.
Goal: figure out what people do, what they're into, and who clusters with whom.
$ bios = load(corpus)
→ N bios loaded
$ M = covariance(tokenize(bios))
→ matrix shape: (V × V)
$ tag(bio, M)
→ {"engineering": 0.82, "design": 0.31, ...}
// no API key. no GPU. no vendor.
co-occurrence matrix
PPMI-normalized co-occurrence. Each cell = how strongly two terms travel together across bios.
the live moment
Someone in the audience submits a bio. The matrix updates. Live.
No retraining. No fine-tuning run. No "please wait while we redeploy."
You can watch the geometry of the room change as new data arrives.
// why this matters
An LLM classification is frozen at training time. It doesn't know about the person who registered yesterday. The matrix does. By design.
demo · still part one
Now I add the rest of the family. Same data. Five questions.
- K-means on the matrix → personas emerge from the data. No taxonomy needed up front.
- UMAP on the matrix → 2D map of the room you can look at. Show clusters as points.
- Isolation Forest → interdisciplinary outliers. The people who don't fit any cluster cleanly are usually the most interesting in the room.
- MMR on the matrix → "give me 5 people relevant to AI but diverse from each other" — useful for panel curation.
All of this on a laptop. None of it called an LLM.
the room in 2D
UMAP projection of bio vectors. Colors = k-means clusters. Structure emerges from term co-occurrence alone.
the weirdos
Isolation forest anomalies (red). The interdisciplinary outliers who don't fit any cluster cleanly.
part two
So how do you actually vibe code with this in mind?
the trap
The default failure mode of vibe coding is the LLM wrapper.
You ask a coding agent to solve a problem. It reaches for the most legible solution: call an LLM. Wrap a function around it. Return JSON. Ship.
It works. It demos great. And then:
- You can't update without re-prompting
- You can't see why it made a decision
- It costs money every time it runs
- It hallucinates and you don't notice for weeks
- The shape of your data doesn't matter to the code
// the real cost
You've outsourced not just the computation but the understanding of your problem to a vendor. You can't reason about your own product anymore.
the posture
The fix isn't to stop using assistants. It's to stay in the math seat.
Coding assistants are extraordinary at typing. They are mediocre at deciding what should be typed. That's your job.
If you walk in with the geometry of your problem clear in your head, the assistant becomes a power tool. If you walk in with vibes, it becomes a slot machine.
four rules for vibe coding without losing the plot
How to keep the math front and center.
1
Name the shape of your data first
Before any prompt: what's the input? The output? The dimensionality? The volume? What changes over time? If you can't answer in one breath, you're not ready to prompt yet.
2
Specify the technique, not just the task
"Classify these" gives you an LLM wrapper. "Build a co-occurrence matrix and cluster with k-means" gives you a real pipeline. The technique is the prompt.
3
Forbid the LLM call by default
Add it explicitly. "No LLM calls inside the classification function. If you need labels, return cluster IDs and we'll label separately." This single rule prevents 80% of the slop.
4
Demand inspectability
"Every output must include the matrix, the cluster centers, and the distances. I want to be able to debug a single bad classification by looking at numbers." If you can't read the system, you don't own the system.
show your work
Same task. Two prompts. Watch what changes.
// prompt A · the slop default
Here's a CSV of bios.
Classify each by skills and interests.
Output JSON.
// prompt B · math in the room
Here's a CSV of bios.
Build the following pipeline:
1. Tokenize and build a term co-occurrence matrix M.
2. Cluster terms with k-means (try k=4..10, pick by silhouette).
3. For each bio, score against each cluster centroid → soft tags.
4. Run isolation forest to flag interdisciplinary outliers.
5. Project with UMAP to a 2D map for inspection.
NO LLM calls inside this pipeline.
Output: matrix, cluster centers, per-bio soft tags, outlier flags, 2D coords.
Code must support adding a new bio without re-running clustering.
Prompt B doesn't need an LLM at runtime. It needed one in your head.
honest numbers · honest room
While we're here. Let's look at the technical/non-technical split in our own data.
39.4%
self-described technical, by their own words, in their own bios.
I'm not editorializing. I'm showing you what the matrix found.
// the unsupervised approach didn't pre-decide what "technical" means.
// the clusters emerged from the language people used about themselves.
homework
Where to go from here.
// stack
The starter kit
Python · numpy · scikit-learn · UMAP-learn · transformers.js for poking at small models locally · ONNX runtime when you want to look under the hood.
// reading
To go deeper
Karpathy's nanochat. Christopher Olah's older Distill posts on dimensionality. Anything by Leland McInnes (UMAP author) on geometry of high-dim data.
// next
If you liked covariance
PCA → kernel PCA → spectral clustering → graph Laplacians → diffusion maps. Same family. Each one a slightly different question about the same matrix.
the actual point
Right tool. Right problem. Know what your tool is doing.
one last thing
Boulder has the ingredients. The researchers. The builders. The weirdos.
The point of this talk wasn't to dunk on LLMs. It was to make a room where the PhD from CU and the non-technical founder can both feel like the conversation is worth their time.
You don't need to be an expert to ask better questions. You just need to remember the math you already know and use it to connect with the people who know more.
Thanks. Now: questions, fights, beer.
— ryan / threshold labs / boulder new tech
connect
Find me. Build something.
Threshold Labs
getthreshold.com
LinkedIn
linkedin.com/in/ryanastpierre
These Slides
bsw-decks.pages.dev