leOS is a local AI substrate where knowledge, tools, media, routing decisions, and cached responses all live as points on the surface of a high-dimensional sphere. Agents don't search by keywords. They search by meaning, route by geometry, and learn by accumulating experience in embedding space.
For an AI agent to be genuinely useful it needs to do thousands of things — read files, search the web, analyze images, transcribe video, call APIs, process astronomy data. Today's agent frameworks hit a hard wall: the more tools you give an agent, the worse it performs. leOS was built to break that wall.
Every tool definition eats context tokens; load 200 tools into an LLM and there's no room for the actual work. leOS keeps the full catalogue in an embedding-indexed registry instead. When a task arrives it's embedded and scored against domain centroids to discard 80-90% of tools instantly (Pass 1), then fine-grained semantic plus keyword scoring runs on the survivors (Pass 2), blended with learned usage history from past sessions (Pass 3). The agent receives only the 6-8 tools that matter. This scales cleanly to thousands.
"Every capability in the system — from embed text to transcribe a YouTube video to query the SDSS catalog — is a typed atomic operation we call a bone. Bones compose into chains. Chains that work become skeletons. Skeletons become skills."
The FABRIK planner (borrowed from inverse kinematics in character animation) works backward from the desired output and forward from available inputs to assemble chains that achieve goals. Successful chains are saved as skeletons — pre-validated patterns reused at zero-LLM cost. Failed trajectories get recorded as displacements so the next similar task avoids the bad path. The library of known-good chains grows every interaction.
The system is cleanly split into a hardware-analog layer, a software layer, and a kernel that bridges them — with four CPU embedding processors feeding the whole stack and a semantic membrane exposing the inside to the world.
BIND, BUNDLE,
PERMUTE,
RESONATOR_FACTORIZE).
The same cycle runs whether the agent is answering a question, writing code, analysing a chart, or ingesting a 50 GB astronomy catalog. Each pass leaves the system a little smarter than it found it.
The incoming task becomes a vector on the unit hypersphere. Literal strings are pre-computed at compile time — zero runtime cost.
Three-pass tool selection: centroid culling, semantic scoring, learned history. Agent sees only the 6-8 tools that matter.
FABRIK searches backward from the goal and forward from available inputs. If a known skeleton matches (similarity ≥ 0.80), reuse before assembly.
Successful chains become skeletons. Failed trajectories get recorded. Idle time consolidates and repairs via the dreaming engine.
These four models aren't just listed in a config file. They work together as a system in ways that produce capabilities none of them has individually. Every model is open-weight and every one of them runs on CPU — no GPU required for the perceptual layer.
CROSS_SEARCH queries the text store
and the vision store with a single vector. Text-to-
image and image-to-text search become the default.
Emergent capabilities
Because nomic-text and nomic-vision emit into the same 768d space, searching images by text description (or text by image) is just cosine similarity. No separate index. No alignment layer. No cross-encoder. CROSS_SEARCH is one kernel instruction.
The nomic (768d) and Qwen/ImageBind (1024d) spaces
are different geometries. The Rosetta codec learns
a projection matrix between them via
Procrustes alignment — find the
orthogonal W minimising ‖AW − B‖ over
paired embeddings. Once calibrated, a displacement
learned in one space broadcasts to all four models.
When two models embed the same content, decomposing their disagreement gives five channels: agreement, A-exclusive, B-exclusive, magnitude dispute, and the purple channel — emergent information in neither model alone. Used for contradiction detection, ad filtering, semantic denoising, and divergence interrupts when the models see something fundamentally differently.
Any 768d nomic vector can be packed into a 16×16 RGB image (768 = 256 × 3) and re-embedded through ImageBind vision. No Rosetta projection needed — the vision encoder preserves local structure automatically. Numerical data becomes frequency sweeps, rhythms, chords, or OFDM-style spectrograms — then embeds through ImageBind audio. Different encoders surface different structural properties of the same data.
Giulio Tononi's Integrated Information Theory provides the design principle. A shared embedding space that every subsystem reads and writes has fundamentally higher Φ (integration) than a collection of independent modules reporting to a dashboard — the whole genuinely exceeds the sum of its parts.
Most agent systems have exactly one model doing everything, and it blocks the whole system while it thinks. leOS runs a two-tier architecture: a lightweight intern model and an army of bots, all on CPU, in parallel with the main agent. Nothing ever competes for GPU memory.
The intern is a 0.8-billion parameter model running
CPU-only with num_gpu=0. It's never
user-facing. It's called via the kernel's
ASSIST instruction, which checks the
reflex arc first (maybe the answer is already cached)
before invoking the model at all. When it does run, it
processes at ~100-200 tokens/second — not fast by GPU
standards, but free, because it never
touches the GPU the main 9B model is using.
The intern handles work across 40+ modules:
Bots run on schedules, monitor data sources, detect anomalies, and only escalate to an LLM when something genuinely needs language understanding. A bot cycle (perceive → evaluate → act) runs entirely on CPU — HTTP requests, file reads, embedding comparisons (~0.1ms each), threshold checks, regex patterns. The system can run dozens of bot cycles per minute without touching the main model.
Bots are assembled, not programmed. The factory combines reusable templates:
perceive_web, perceive_api, perceive_rss, perceive_file, perceive_kb, perceive_partition, perceive_observation, perceive_kernel, perceive_diff, perceive_multi, perceive_port, and moreact_record, act_alert, act_kb, act_escalate, act_chain, act_ingest, act_spawn_bot, act_displace, act_emit, act_llm, and moreThe dreaming engine itself is a scoped agent: during idle time it operates in a System Self-Improvement scope, spawning child scopes for scope health review, capability audits, reflex optimisation, KB gap analysis, and context compaction. The system uses the same machinery to improve itself that it uses to do anything else.
leOS borrows mathematics from character animation, cosmology, the demoscene, neuroscience, and video compression — and applies it directly to the embedding medium. These aren't metaphors. They're the same math on different data.
Three primitives — bundling (addition), binding (circular convolution via FFT, O(d log d)), and permutation (cyclic shift) — form a Turing-complete computing framework (Kleyko et al., Proc. IEEE, 2022). The same three ops compose sets, sequences, trees, and graphs into a single fixed-width vector.
Every task-to-response is recorded as a tangent vector on the hypersphere. Similar trajectories compress into shared I-frames, P-frames, and B-frames. The codec stores the pattern of transformation, not the output. Reconstructing a response costs a vector lookup.
When enough consistent displacements accumulate in a region (5+ by default), the reflex engine fires cached responses with conformal confidence bounds. Familiar patterns bypass the LLM entirely and replay from geometric cache in microseconds.
Named ellipsoidal regions in embedding space define semantic boundaries using signed distance field math. Union is min(a,b), intersection is max(a,b), subtraction is max(a,−b) — arbitrarily complex semantic filters from trivial operations. The gradient gives a free "direction to nearest boundary" vector.
Dense SDF regions deflect nearby queries toward them, like light bending around a galaxy. Implemented with the Barnes-Hut tree — the same O(n log n) algorithm used for galactic N-body simulation. Frequently-used vectors exert more pull over time.
Circular convolution stores multiple key-value pairs
in one fixed-width vector:
record = k₁⊗v₁ + k₂⊗v₂ + … + kₙ⊗vₙ.
Retrieve with v_i ≈ k_i† ⊗ record. Based
on Plate's Holographic Reduced Representations. Noise
after 10-20 compositions is handled by the
cleanup memory — error correction
analogous to digital systems.
An information-density field modeled on mycorrhizal networks. Grows toward areas of activity via success-density feedback. Prunes neglected regions. Hub vectors become knowledge redistributors (inspired by Simard's mother tree research on scale-free mycorrhizal topology).
Via the Chinese Remainder Theorem. Pick coprime moduli (e.g. 7, 11, 13, 17, 19, 23 — product ≈ 3.2M), assign random digit vectors, BIND them. Addition becomes binding. Comparison becomes cosine similarity. Integer math up to ~3.2M using only vector ops the embedding hardware already supports. Solves subset-sum via resonator networks.
Given a composite vector
s = x₁ ⊗ x₂ ⊗ … ⊗ xₖ and candidate
codebooks, each factor estimate iteratively updates
until convergence in 5-50 iterations. The inverse of
VSA binding — the mechanism behind NP-hard search
inside embedding space.
Based on Karl Friston's active inference framework. A lightweight linear predictor estimates the expected output before running any agent. Confident → use prediction directly (System 1, fast). Uncertain → full LLM runs (System 2). Target ratio: ~80% of routine tasks handled without LLM inference.
Same vector queried at multiple Matryoshka scales. 32d shows broad regions ("work", "media"), 128d shows subregions ("project notes", "code"), 1024d shows individual documents. Continuous landscape. Zooming costs nothing — same vector under different projections.
Run the intern on contrastive example pairs, compute
the mean hidden-state difference, normalize. At
inference, inject via forward hook:
hidden += α · steering. One tensor
addition per token — microseconds. Replaces fragile
prompt engineering with compiled geometric
subroutines.
Before the intern generates a single token, a linear probe reads its hidden state after prefill and predicts REFLEX (cache hit), ASSIST (intern handles it), or ESCALATE (main model). Trained online from accumulated outcomes. Reaches 92%+ accuracy with use.
The KB void map probes the space between knowledge clusters. "Know Python, know async, but no article on Python async" is a void. When frequency crosses a threshold, the dreaming engine autonomously researches and fills these gaps during idle time.
Parent and child scopes let agents spawn sub-work without polluting the parent's reasoning. Only deliverables cross scope boundaries. The context assembler builds agent prompts from eight priority tiers, each with its own token budget.
Runs during idle time. Consolidates the displacement codec, runs void detection, grows and prunes the living medium, compacts stale scopes, audits bones, renders deferred thought monologues, generates reflections from unprocessed learning. The system uses itself to improve itself.
leOS has a quality-control system that catches agents hallucinating, looping, or producing shallow non-answers — without making a single LLM call. Everything is pure vector geometry on the unit hypersphere. The core concept is borrowed from cosmological observation.
The displacement vector — the tangent from task embedding to response embedding, computed via the logarithmic map — is unusually long compared to what the neighborhood predicts. In astronomy, redshift means an object is moving away from the observer.
In leOS, redshift means the response is semantically receding from the task. Drift, off-topic wander, confabulation, hallucination. The agent's mouth is working but it's answering a different question.
The displacement is suspiciously short, or task-response cosine similarity exceeds 0.92. In astronomy, blueshift means an object is approaching.
In leOS, blueshift means the response is echoing the task back in different words. A non-answer like "I'll do that!" or "Great question — let me think about it." Catches the failure mode of appearing to engage without producing output.
For each response, the detector queries the displacement log for the K most similar past tasks and computes the mean and standard deviation of their displacement magnitudes. If the actual displacement exceeds the prediction by more than 1.8σ: redshift. Below the prediction by 1.5σ: blueshift. Fewer than 3 similar past tasks: void — unexplored territory. Pairwise cosine similarity above 0.85 across the last N responses flags a semantic loop even when the text differs.
The drift detector replaces LLM-based quality checking with pure math. It runs on every agent response automatically and costs zero tokens. The metaphor also drives the emotion parameters for voice synthesis — redshift produces uncertain delivery, convergence produces calm confidence, void produces a contemplative hush.
Drift detection, voice synthesis, visual rendering, and the knowledge base all connect into a single learning loop. A flagged learning experience becomes a narrated thought video. The video gets triple-embedded (vision, audio, text) and stored as a searchable KB article. Future agents find it by meaning and learn from past mistakes without anyone writing documentation.
ChatterboxTTS is a two-stage neural TTS (T3 autoregressive + S3Gen decoder). Voice cloning is zero-shot: provide 5-30 seconds of reference audio and the model matches timbre, pitch, and cadence. No fine-tuning. Reference audio can come from direct uploads, video extraction via ffmpeg, or URLs processed through yt-dlp.
Drift state drives emotion. The EmotionMapper converts the geometric drift classification into TTS parameters per line:
[sniff]. Strong: [sigh].[chuckle]. Strong: [laugh].[clears throat].[gasp].
The paralinguistic tags are rendered by the same voice
model that produces the speech — a [sigh]
during redshift sounds like a real sigh from the
speaker. Voice modulation also adjusts the TTS
sampling itself: blueshift lowers min_p (more creative
output), redshift raises it (more stable). The voice
isn't just speaking differently — the model is
generating differently. All output is
watermarked with resemble-perth as
AI-generated.
The thought canvas is a 256×224 pixel numpy array (deliberately SNES-era resolution — the visual output is a byproduct of computation, not the point of it) where agents render 2D Gaussian splats while they work. Each splat is 8 floats: position, scale, rotation, color, opacity.
The renderer uses accumulated summation —
for each pixel, the color is the sum of all splat
contributions weighted by their gaussian falloff. This
is order-independent: no z-sorting pass.
Hundreds of splats at 256×224 render in
single-digit milliseconds on CPU with
numpy vectorization. The /thought page
streams it live.
When the system ingests images, the SplatFitter decomposes them into splat representations — iterative optimization fits the gaussian parameters to a target image, then stores the parameters alongside the image's embedding. Over time this builds a learned mapping from concept-space to splat-space. An agent wanting to visualize a concept searches this cache for the nearest match, renders splats, embeds the result, and refines via perceptual feedback from nomic-vision. The system learns to draw by practicing.
The MonologueRenderer combines it all: canvas frames + ChatterboxTTS audio + EmotionMapper params → composite MP4 → triple-embed (vision + audio + text) → knowledge base article. Cross-modal search retrieves thought videos by query in any modality.
The media pipeline runs every applicable analyzer on incoming media and lands the results in embedding space. The philosophy: you don't know in advance what you'll want to search for, so extract everything, embed everything, and let the geometry sort out relevance later.
Every incoming image goes through every tool that might produce useful signal:
Videos: ffprobe metadata, multi-strategy keyframe extraction (scene-detect, low-threshold, timed fallback) with perceptual deduplication up to 30 frames, Whisper speech transcription, audio extraction. The entire image pipeline then runs on every extracted keyframe. Audio spectrograms via matplotlib.
Audio: metadata, Whisper transcription, spectrogram, ImageBind audio embedding.
The original video file is deleted after keyframe extraction to save disk — all the information survives in the embeddings and analysis records. A 2-hour video becomes 30 embedded keyframes, a full transcript, and an audio vector. All of it is cross-searchable.
All processing runs through a single-worker job queue
so simultaneous submissions don't step on each other.
YouTube and TikTok both route through yt-dlp
automatically. The same MEDIA_INGEST
kernel instruction handles URLs, files, streams, and
uploads.
What this unlocks
Because every modality lands in a shared space, questions that usually require three different tools become one query. "Find the video clip where someone is explaining SDR with a red flag visible in frame" runs in a single cross-modal search: the text query embeds, the 30 keyframes per video embed, the transcripts embed, the audio embeds, and cosine similarity does the rest. No "video search API" required.
The temptation is to bolt a traditional API gateway onto leOS — a collection of REST endpoints that external things POST to. We didn't. The membrane treats every entry point as an embedding-space citizen with an intent vector, a topic region, and a subscription fan-out. Data arriving through the membrane gets perceived, embedded, evaluated, and published to whoever is listening in the correct semantic neighborhood.
This works equally well for an IoT temperature reading processed in milliseconds and a 50 GB FITS astronomy catalog processed over hours, streamed incrementally, and resumable across server restarts. The difference is the processing path, not the model.
Create a named port via
POST /ports. Every port has an
intent vector derived from its
name + description, which lets agents and external
systems discover it semantically — "what
can leOS accept about genomics?" returns matching
ports.
Push data with POST /in/<port_id>
or streaming with /in/<port_id>/stream.
The ingestion pipeline runs
normalize → perceive → embed → evaluate →
store → notify on every item. Port config
decides which field to embed, which reference text
to compare against for signal detection, and what
partition to land in.
The subscription bus (SSE, webhooks, internal queues) fans out every event to whoever's listening, with topic wildcards and filters.
Upload a file, create a job, watch it stream results. The reader never loads the full file — peak memory is one chunk plus the already-loaded embedding models.
Supported formats:
.csv, .tsv,
.jsonl, .fasta,
.npy, .npz,
.fits (astropy),
.h5/.hdf5 (h5py),
.parquet (pyarrow), plain text.
Jobs are resumable across restarts
with checkpoint recovery. Results stream live —
you don't wait for a 50 GB file to finish before
seeing the first flagged row. Every output port
exposes /rows, /download,
/csv, /embeddings (as a
numpy file), and /search (semantic
search within the results, while the job is
still running).
Embedding strategies for scientific data
A stellar spectrum, a protein expression profile, a seismic trace, an EEG channel — all become 1024d vectors via ImageBind's audio encoder. They now coexist in the same space as sounds, images, and text. This is the strategy that makes searching spectra by plain English description possible.
Telescope images, microscopy slides, medical scans, FITS image HDUs. Each 2D array → 1024d ImageBind vector. Cross-queryable with text, audio, and any other embedded modality.
For gene expression profiles, physics simulation states, financial time-series — anywhere the numbers themselves are the semantic content. Random projection into 768d, normalized to the unit hypersphere. Preserves Euclidean structure while entering embedding space.
Catalog rows become descriptive sentences via
user templates: "galaxy ra {ra} dec {dec}
redshift {z:.2f} type {class}". Embeds
via nomic-text. Mixed-type tabular data where
the meaning of each row matters.
Scientific use cases
Upload a FITS catalog. Port embeds each spectrum's
flux array via ImageBind audio encoder. The
flag_void_region analysis bone
identifies spectra landing in low-density regions
of the embedding space — anomalous
observations that don't cluster with known types.
No labeled training set required.
POST /outputs/<id>/search
"broadened emission lines
consistent with AGN outflow"
Stream a FASTA file sequence-by-sequence. Embed via ImageBind audio (sequences as waveforms) or numeric_direct. Void detection flags unusual sequences automatically. A bot wakes up, escalates the most unusual ones to the main LLM for interpretation, and writes the interpretations back to the knowledge base.
HDF5 snapshots via h5py, NumPy arrays via mmap. Same object at different time steps produces a displacement vector that encodes what changed and in what semantic direction. After 10,000 snapshots, the reflex arc has learned which regions correspond to which regimes — subsequent runs route through cache.
Because everything lands in the same space, a molecular descriptor (numeric_direct) and a paper abstract (Qwen3) and a crystal structure image (imagebind_image) are all cosine-comparable. Find papers related to a compound you've never seen before — by its properties, not its name.
leOS exposes an MCP server with tools for store, search, cross-modal search, embed, status, media-ingest, and arbitrary kernel execution. Any MCP-compatible client can use leOS as a remote brain with full cross-modal semantic search and the learning substrate behind it.
API_LEARN ingests an API spec and
stores it as a reusable adapter. The spec itself
gets embedded, so agents find the right adapter
semantically. leOS publishes its own API as a
learnable spec — other leOS instances
can learn it and call it.
Traditional tools — pandas, numpy, scikit-learn — treat a million-row dataset as a matrix to be filtered by explicit rules. The researcher must know what they're looking for before they look. leOS treats the same dataset as a million points in semantic space. Anomaly detection requires no labeled training set; low-density regions of the embedding space are unusual by definition. The second million-row dataset of the same type processes faster than the first. That isn't incremental improvement — it's a fundamentally different relationship between a researcher and their data.
Every interaction feeds back into one or more of these systems. None require manual training. The substrate gets faster, smarter, and more knowledgeable automatically.
Every task-to-response trajectory recorded as a tangent vector. Similar trajectories compress via I/P/B frames. The codec stores the pattern of transformation, not the output.
Enough consistent displacements in a region graduate into cached responses with conformal confidence bounds. Familiar patterns bypass the LLM in microseconds.
Successful bone chains become pre-validated patterns. FABRIK tries known skeletons first (similarity ≥ 0.80) before assembling anything new.
Every session records which tools got used. Usage
history feeds back as a 20% weight in scoring. The
system learns that "PDF table extraction"
reliably needs doc_query.
Tracks capability gaps — tasks where no tool scored well. Gap vectors cluster naturally. When a cluster crosses a frequency threshold, the system can generate a new tool from existing parts.
When an LLM escalation succeeds on a novel task, the displacement compiler captures the trajectory and creates a permanent reflex entry. One successful call teaches the system to handle all similar tasks without LLM involvement.
The approach leOS takes is built on recent work across several fields. The mathematical proofs exist and the experimental results are published.
Proved by emulating a (2,4) Turing machine and Rule 110 cellular automaton using only bundling, binding, and permutation. The emulated machine executed over 10⁹ error-free updates.
LLM reasoning entirely in continuous latent space, outperforming chain-of-thought. Continuous thought vectors encode multiple alternative reasoning paths simultaneously — breadth-first search natively in continuous space.
Constraining representations to unit norm and expressing transformations as hypersphere displacements produced a 10× training speedup.
A differentiable, continuous-field computer. O(N) scaling with Turing completeness. Demonstrates cellular automata, PDE solving, and image refinement in one architecture.
Unified residue number systems with HD vectors. Addition and multiplication as separate binding operators. Resources scale only logarithmically with numeric range. Solves NP-complete subset-sum via resonator networks.
First model to learn a complete graphics pipeline without ray tracing or rasterization. Scenes as triangle tokens. Rendering is pure attention over embeddings.
LLM agents collaborating through shared continuous latent space achieve 14.6% higher accuracy, 70-84% fewer tokens, 4× faster inference. The shared space is the coordination mechanism.
Compresses 512d CLIP embeddings to 3d per Gaussian via scene-specific autoencoder. LangSplatV2 reaches 476 FPS for feature splatting — a 42× speedup. Points in a 3D scene carry natural-language meaning.
leOS is a single-developer project built in the open. The mission is to build the growing, adapting substrate that AI agents need to become genuinely capable — not a static tool library, but a living system that gets smarter, faster, and more knowledgeable with every interaction.
The current milestone is a clean public release: a first-time user should be able to install, launch, and complete real tasks — token reports, web research, scientific dataset ingestion, small app creation — without tripping on anything.
A Solana-based community token whose liquidity-pool fees feed directly into ongoing development. Trading the token literally funds the next feature.
Every swap on the official liquidity pool sends a share of fees to the development wallet. As long as people trade the token, leOS keeps shipping — without subscription fees, without investor control, without a roadmap dictated by anyone's exit plan.
The contract address, chart link, and launch date will be dropped on this page as soon as they're live.
leOS is open source and developed publicly. Clone the repo, run it locally, and watch an AI system that actually learns from every interaction.