Mamba: Linear-Time Sequence Modeling with Selective State Spaces

TL;DR

Mamba is not an attention mechanism, but rather an attention-free architecture for sequence modeling. It achieves linear time complexity with respect to sequence length (self-attention has quadratic complexity). Mamba builds upon State Space Models (SSMs) and can be viewed as an enhanced RNN with:

Improved selectivity, allowing input-dependent parameter adjustments
Faster training through the use of Parallel Prefix Sum algorithm
Linear-time scaling for both training and inference

This approach enables Mamba to efficiently handle long sequences and achieve state-of-the-art results across diverse domains including language, audio, and long-context tasks.

過去 4~5 年 transformer 在 deep learning 領域可以說是翻天覆地，把 CNN 的紀錄都刷了個遍。而 transformer 底層技術 self-attention 有個最大的缺點：次方成長的複雜度（時間+空間）。 Mamba 完全沒有用 self-attention，達成了相同的 performace，而且是線性複雜度。可以把 Mamba 想像成進化的 RNN：有更好的選擇狀態機制跟加快訓練時間 (PrefixParallelSum: n -> log(n))

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

TL;DR

Presentation

Enjoy Reading This Article?