VILA_M3_3B

VILA_M3 is a medical vision language model that enhances VLMs with medical expert knowledge, utilizing domain-expert models to improve precision in medical imaging tasks.

Vision-Language Healthcare https://huggingface.co/MONAI/Llama3-VILA-M3-3B/resolve/main/LICENSE HuggingFace Model Link 3 params

Model Information

Current Version
v1.0.0 initial release of VILA_M3_3B model
Modality
Not specified
Anatomy Target
Not specified
Authors
OLEJA, Vishwesh Nath, Wenqi Li, Dong Yang, Andriy Myronenko, et al. from NVIDIA, SingHealth, and NI
Model Size
3 parameters
Parameters

Model Access

Research and Citation

BibTeX

@article{nath2025vila,
  title={VILA_M3: Enhancing Vision-Language Models with Medical Expert Knowledge},
  author={Nath, Vishwesh and Li, Wenqi and Yang, Dong and Myronenko, Andriy and Zheng, Mingxin and Lu, Yao and Liu, Zhijian and Yin, Hongxu and Tang, Yucheng and Guo, Pengfei and Zhao, Can and Xu, Ziyue and He, Yufan and Law, Yee Man and Simon, Benjamin and Harmon, Stephanie and Heinrich, Greg and Aylward, Stephen and Edgar, Marc and Zephyr, Michael and Han, Song and Molchanov, Pavlo and Turkbey, Baris and Roth, Holger and Xu, Daguang},
  journal={arXiv preprint arXiv:2411.12915},
  year={2025}
}