MinD-Vis is a groundbreaking framework designed to decode human visual stimuli from brain recordings. Developed to deepen our understanding of the human visual system, MinD-Vis aims to bridge the gap between human and computer vision through the Brain-Computer Interface.
MinD-Vis represents a significant advancement in the field of Brain-Computer Interface and human vision decoding. By effectively decoding visual stimuli from brain recordings, it provides valuable insights into the workings of the human visual system and paves the way for future research in this area.
Framework Overview
The MinD-Vis framework consists of two main stages1:
Sparse-Coded Masked Brain Modeling (SC-MBM): This stage focuses on modeling the brain using sparse coding techniques. It helps in capturing the essential features and patterns in the brain recordings.
Double-Conditioned Latent Diffusion Model (DC-LDM): This stage uses a latent diffusion model conditioned on both the input brain recordings and the output visual stimuli. It helps in generating highly plausible images that match semantically with the input brain recordings.
By boosting the information capacity of feature representations learned from a large-scale resting-state fMRI dataset, MinD-Vis can reconstruct highly plausible images with semantically matching details from brain recordings using very few paired annotations1.
Benchmarking and Results
MinD-Vis has been benchmarked both qualitatively and quantitatively. The experimental results indicate that MinD-Vis outperforms state-of-the-art methods in both semantic mapping (100-way semantic classification) and generation quality (FID) by 66% and 41% respectively
For more information go to the official GitHub repository for MinD-Vis, which provides detailed information about the framework, its components, and its performance.