Latent Language Explorer V2
A navigable terrain built from 36,125 concepts in 384-dimensional embedding space — finding and measuring the unnamed gaps between words.
What It Does
There are ideas that exist but do not have words. Not because they are vague or contested, but because the structure of language left them unnamed — concepts that sit in the gaps between the categories our vocabulary happens to cover. LLE V2 finds them, measures them, and describes them.
The terrain is a navigable map of the embedding space of 36,125 concepts organized by Roget’s Thesaurus (1911). Peaks are dense clusters of named meaning. Valleys are transitions. The deserts — shallow fractures between conceptual territories — are where the embedding model encodes something real that has no syntactic representation in natural language.
Companion to the oil painting Are there deserts in vector space?
Quick Start
# Install & start (first run)
.\start.ps1 -Install # Windows
./start.sh --install # Mac/Linux
# Start both servers
.\start.ps1 # Windows
./start.sh # Mac/Linux
# Backend API: http://localhost:8000/api/docs
# Frontend: http://localhost:3000
Run the Pipeline
# Full pipeline from scratch
.\run_pipeline.ps1 # Windows
./run_pipeline.sh # Mac/Linux
# Downstream only (skip vocab rebuild)
.\run_pipeline.ps1 -Downstream # Windows
./run_pipeline.sh --downstream # Mac/Linux
The terrain data is not committed to the repo (large binary files). Run the full pipeline to generate it from scratch.
Architecture
Frontend
TypeScript + React + Vite. Single Three.js renderer shared between the Landscape and Discovery pages.
Backend
FastAPI Python backend. all-MiniLM-L6-v2 sentence-transformer embeddings. SQLite journal with atomic writes.
The Terrain
KDE density as terrain height — not a third UMAP dimension. 2D UMAP layout (seed 42). Never change the seed after the first embedding run.
Probe Deserts
Measured in 384-dimensional L2 space. Interior probe steps only (α 0.10–0.90). Deepest found: 0.9329 (chairperson vs composure).
V1 → V2
| Dimension | V1 | V2 |
|---|---|---|
| Vocabulary size | 8,735 | 36,125 |
| Embedding model | GloVe 300d (static) | all-MiniLM-L6-v2 (384d) |
| Taxonomy | 9 flat domains | 6 classes, 991 categories |
| Journal entries | 14 | 548 |
| Max desert depth | 0.076 (different scale) | 0.9329 |
| Architecture | Vanilla JS, two canvases | TypeScript, React, single renderer |
Key Discoveries
The unnamed quality of presiding-without-reacting — the composure required to chair.
Deserved standing — authority that derives from character rather than appointment.
Transformation at a scale smaller than observation — change without an observable mechanism.
The credential as a form of passage — the login as a kind of navigation.
Pair Selection & Desert Thresholds
- Gate thresholdProbe desert ≥ 0.50 (L2 on unit sphere)
- Shallow thresholdProbe desert ≥ 0.70
- Zipf frequency filter≥ 3.0 — excludes archaic/rare terms
- Shared neighbourhood≥ 1 common neighbour in top-20
- Cosine similarity≤ 0.85 — excludes near-synonyms
- Single words onlyNo hyphens or underscores
Environment
ANTHROPIC_API_KEY Required for generative decoding
PORT_BACKEND Default: 8000
PORT_FRONTEND Default: 3000
# Copy .env.example to .env — never commit .env