CascadiaPrime


CascadiaPrime Cognition

Home About Blog X.AI Understand the Universe Future of Life Institute Oxford Future of Humanity Institute Cambridge Center for Existential Risk Machine Intelligence Research Institute Partnership on AI

Center for Brains, Minds & Machines US Brain Project EU Brain Project Blue Brain Project China Brain Project AI for the Brain CLAIRE Research Network

The Montreal Institute for Learning Algorithms (MILA) Vector Institute for Artificial Intelligence The Alberta Machine Intelligence Institute (AMII) CAIDA: UBC ICICS Centre for Artificial Intelligence Decision-making and Action CIFAR Canadian Artificial Intelligence Association (CAIAC)

The Stanford Institute for Human-Centered Artificial Intelligence Open AI The Association for the Advancement of Artificial Intelligence (AAAI) Allen Institute for AI AI 100 The Lifeboat Foundation Center for Human-Compatible AI

CascadiaPrime Cognition - State Space Models (SSMs)

Unlike the brain, existing LLM transformer models use staggering amounts of energy.
Simply put, not only are the transformer models in LLMs expensive to run, they do not align with sustainability objectives for the planet.
SSMs are models with three views. A continuous view, and when discretized, a recurrent as well as a convolutive view. SSMs have an ability to handle very long sequences (number of tokens), generally with a lower number of parameters than other models (ConvNet or transformers), while still being very fast. SSMs can be applied to text, vision, audio and time-series tasks (or even graphs).
Jensen Huang, NVIDIA: "And I think the work around state-space models, or SSMs, that allow you to learn extremely long patterns and sequences without growing quadratically in computation, probably is the next transformer."

What is a State Space Model?

	Wiki: State Space (computer science)
	Mathworks: What are State-Space Models?
	State space model (SSM) definition and history in various fields
	Wiki: State-space representation

State Space Model (SSM)

	The Stanford AI Lab Blog : Can Longer Sequences Help Take the Next Leap in AI? (Chris Ré, Tri Dao, Dan Fu, Karan Goel) (June 9, 2022)
	Could State Space Models kill Large Language Models? (January 18, 2024)
	Huggingtace: Introduction to State Space Models (SSM)
	Structured State Space Models for In-Context Reinforcement Learning Part of Advances in Neural Information Processing Systems 36 pre-proceedings (NeurIPS 2023) Main Conference Track
	A Visual Guide to Mamba and State Space Models An alternative to Transformers for language modeling (February 2024)

State Space Model Papers (SSM)

	arXiv: Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective Vaisakh Shaj (April 24, 2024)
	arXiv: Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, Christopher Ré (August 5, 2023)
	arXiv: Mamba: Linear-Time Sequence Modeling with Selective State Spaces (December 1, 2023)
	arXiv: Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré (October 26, 2021)
	arXiv: Convolutional State Space Models for Long-Range Spatiotemporal Modeling Jimmy T.H. Smith, Shalini De Mello, Jan Kautz, Scott W. Linderman, Wonmin Byeon (October 30, 2021)
	Amazon Research: Deep State Space Models for Time Series Forecasting (32nd Conference on Neural Information Processing Systems (NeurIPS 2018))
	arXiv: Long Range Arena: A Benchmark for Efficient Transformers (November 8, 2020)
	arXiv: Mamba: Linear-Time Sequence Modeling with Selective State Spaces (December 1, 2023)

State Space Model (SSM) Talks (You Tube)

	Efficiently Modeling Long Sequences with Structured State Spaces, Albert Gu, Stanford MedAI
	Mamba: Long Range Arena: A Benchmark for efficient Transformers
	Mamba STRIKES again Overview

Top