A foundation model of vision, audition, and language for in-silico neuroscience | Research - AI at Meta

A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience

===========================================================

Introduction


Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, preventing a unified model of cognition in the human brain. This paper introduces TRIBE v2, a tri-modal (video, audio, and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions.

Key Findings


  • TRIBE v2 accurately predicts high-resolution brain responses for novel stimuli, tasks, and subjects, superseding traditional linear encoding models, delivering several-fold improvements in accuracy.
  • The model enables in-silico experimentation, tested on seminal visual and neuro-linguistic paradigms, recovering a variety of results established by decades of empirical research.
  • By extracting interpretable latent features, TRIBE v2 reveals the fine-grained topography of multisensory integration.

Methodology


  • TRIBE v2 is trained on a unified dataset of over 1,000 hours of fMRI across 720 subjects.
  • The model is evaluated on a variety of naturalistic and experimental conditions, including novel stimuli, tasks, and subjects.

Implications


  • TRIBE v2 establishes artificial intelligence as a unifying framework for exploring the functional organization of the human brain.
  • The model has the potential to revolutionize our understanding of cognitive neuroscience and inform the development of more effective treatments for neurological and psychiatric disorders.

Related Work


  • Unified Vision–Language Modeling via Concept Space Alignment (Qiu et al., 2026)
  • Disentangling the Factors of Convergence between Brains and Computer Vision Models (Raugel et al., 2025)
  • Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset (Agrawal et al., 2025)
  • Emergence of Language in the Developing Brain (Evanson et al., 2025)

Conclusion


TRIBE v2 represents a significant advancement in the field of cognitive neuroscience, providing a unified framework for exploring the functional organization of the human brain. The model's ability to accurately predict brain activity and enable in-silico experimentation has the potential to revolutionize our understanding of the brain and inform the development of more effective treatments for neurological and psychiatric disorders.

quick.as — a curated link directory