VILA_M3_3B

VILA_M3 is a medical vision language model that enhances VLMs with medical expert knowledge, utilizing domain-expert models to improve precision in medical imaging tasks.

Vision-Language Healthcare https://huggingface.co/MONAI/Llama3-VILA-M3-3B/resolve/main/LICENSE HuggingFace Model Link 3 params

Model Information

Current Version

v1.0.0 initial release of VILA_M3_3B model

Modality

Not specified

Anatomy Target

Not specified

Authors

OLEJA, Vishwesh Nath, Wenqi Li, Dong Yang, Andriy Myronenko, et al. from NVIDIA, SingHealth, and NI

Model Size

3 parameters

Parameters

Model Access

Open HuggingFace Model External Link

Research and Citation

BibTeX

@article{nath2025vila,
  title={VILA_M3: Enhancing Vision-Language Models with Medical Expert Knowledge},
  author={Nath, Vishwesh and Li, Wenqi and Yang, Dong and Myronenko, Andriy and Zheng, Mingxin and Lu, Yao and Liu, Zhijian and Yin, Hongxu and Tang, Yucheng and Guo, Pengfei and Zhao, Can and Xu, Ziyue and He, Yufan and Law, Yee Man and Simon, Benjamin and Harmon, Stephanie and Heinrich, Greg and Aylward, Stephen and Edgar, Marc and Zephyr, Michael and Han, Song and Molchanov, Pavlo and Turkbey, Baris and Roth, Holger and Xu, Daguang},
  journal={arXiv preprint arXiv:2411.12915},
  year={2025}
}