Deep

TST Cuts Pre-training Cost by 60%

Mason

20 May 2026 — 1 min read

In a breakthrough that could reshape the economics of large-scale AI training, Nous Research has unveiled Token Superposition Training (TST), a novel method that slashes pre-training costs by roughly 60% without altering model architecture. In a 10B-A1B mixture-of-experts experiment, TST consumed just 38.7% of the GPU hours (4,768 B200-hours versus 12,311) while delivering superior loss and downstream performance.

TST splits the pre-training process into two distinct phases: a superposition phase that aggregates consecutive tokens for coarse-grained learning, followed by a recovery phase that reverts to standard next-token prediction. This curriculum redesign allows the model to build foundational representations more efficiently early on, reducing the computational burden of processing every token individually from the start.

Compared with DeepSeek's heavy system-level optimization approach, TST offers a lighter path to efficiency. Rather than overhauling infrastructure or hardware, Nous Research attacks the problem at the learning algorithm level—redesigning the early learning curriculum to compress redundant information. The result is a method that is architecture-agnostic and could democratize access to large-scale model training, especially for teams with limited compute budgets. While detailed ablation studies and scaling laws remain to be published, the initial results suggest TST may represent a fundamental shift in how we think about pre-training efficiency.

Kimi K3 Launch: Open-Source Giant Shakes AI Landscape

Moonshot AI released Kimi K3, an open-source model with 2.8 trillion parameters and 100 million token context, delivering performance comparable to top-tier closed-source systems at a fraction of the cost. The release signals a strategic pivot in the AI arms race, where competitive advantage now hinges on cost efficiency

Microsoft Taps AWS as GitHub AI Agents Break SLAs

In an unprecedented move to stabilize its platform under a relentless deluge of AI coding agent traffic, Microsoft has quietly routed core GitHub operations through rival Amazon Web Services (AWS), following a series of crippling outages that saw availability dip below 99% in June and nine incidents in May alone.

Codenotary flags 210,000 risky AI agent actions daily

Codenotary's AgentMon platform now monitors over 3 million AI-agent interactions daily across enterprise clients, flagging approximately 210,000—or 7%—as potentially unsafe or non-compliant, a signal that runtime security gaps in production AI systems are far more widespread than previously recognized. According to the company, the vast

AWS shows agentic AI future in advertising at Cannes

AWS is returning to the Cannes Lions International Festival of Creativity this June with a hands-on activation called Rue Visionnaire, placing AI agents directly in the hands of advertisers. From June 22 to 26, 2026, attendees can guide these agents through the complete creative workflow—starting with ideation and progressing

Read more

Kimi K3 Launch: Open-Source Giant Shakes AI Landscape

Microsoft Taps AWS as GitHub AI Agents Break SLAs

Codenotary flags 210,000 risky AI agent actions daily

AWS shows agentic AI future in advertising at Cannes