Mamba: Linear-Time Sequence Modeling with Selective State Spaces

TL;DR

Mamba is not an attention mechanism, but rather an attention-free architecture for sequence modeling. It achieves linear time complexity with respect to sequence length (self-attention has quadratic complexity). Mamba builds upon State Space Models (SSMs) and can be viewed as an enhanced RNN with:

  1. Improved selectivity, allowing input-dependent parameter adjustments
  2. Faster training through the use of Parallel Prefix Sum algorithm
  3. Linear-time scaling for both training and inference

This approach enables Mamba to efficiently handle long sequences and achieve state-of-the-art results across diverse domains including language, audio, and long-context tasks.


過去 4~5 年 transformer 在 deep learning 領域可以說是翻天覆地,把 CNN 的紀錄都刷了個遍。 而 transformer 底層技術 self-attention 有個最大的缺點:次方成長的複雜度(時間+空間)。 Mamba 完全沒有用 self-attention,達成了相同的 performace,而且是線性複雜度。 可以把 Mamba 想像成進化的 RNN:有更好的選擇狀態機制跟加快訓練時間 (PrefixParallelSum: n -> log(n))

Presentation




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Mamba in Computer Vision