Quasar: Scaling Context to Millions of Tokens with Linear Attention
How continuous-time attention mechanisms break the quadratic bottleneck and enable 10M+ token context at 1/10th the cost of standard transformers.
Novel architectures. Millions of tokens of context. Trained on decentralized compute. All open-weight.
Novel architectures and training methods that move the field forward.
Standard transformers scale quadratically. Our Quasar architecture uses continuous-time attention that scales linearly — handling millions of tokens at a fraction of the compute.
We train on Bittensor's distributed compute network. Miners compete to produce the best model checkpoints, making frontier-class training accessible without centralized GPU clusters.
Every model we release ships with full weights under Apache 2.0. No gated access, no waitlists. Download from Hugging Face and deploy on your own infrastructure.
Our first foundation model series. Long-context reasoning with linear attention.
Foundation model for long-context understanding and reasoning
How continuous-time attention mechanisms break the quadratic bottleneck and enable 10M+ token context at 1/10th the cost of standard transformers.
Our subnet architecture: how miners compete to train long-context checkpoints, and how validators verify quality across sequence lengths from 8K to 2M tokens.
2.4 billion parameters, 2 million token context window, trained on 2 trillion tokens. Released under Apache 2.0 — download and deploy today.