Subquadratic AI has launched an innovative model called SubQ-1.1-Small, utilizing cutting-edge Smart Sparse Attention technology. This model achieves near-perfect performance in long-context retrieval, handling up to 12 million tokens effectively on challenging tests, such as the needle-in-a-haystack scenario. Remarkably, SubQ-1.1-Small boasts an attention compute reduction of nearly 1,000 times, making it a game-changer in the realm of AI.

One of the standout features of SubQ-1.1-Small is its ability to strike a balance between long-context optimization and general reasoning skills. It maintains strong performance across a variety of benchmarks, including knowledge assessments, coding tasks, and non-coding enterprise agent evaluations.

When tested at 1 million tokens, SubQ-1.1-Small demonstrates an impressive efficiency, requiring 64.5 times less compute compared to traditional dense attention models. Additionally, it operates 56 times faster than the previous FlashAttention-2, as confirmed by independent verifications. This significant reduction in compute requirements and enhanced speed positions SubQ-1.1-Small as a powerful tool for enterprises seeking advanced AI solutions.