Development and Validation of a Multimodal Multitask Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence（开发和验证多模态多任务视觉基础模型用于通用眼科人工智能）-南京医科大学图书馆

学术论文

当前位置首页 > 医学人工智能 > 学术论文 > 正文

Development and Validation of a Multimodal Multitask Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence（开发和验证多模态多任务视觉基础模型用于通用眼科人工智能）

信息来源：发布日期：2024-12-03

ORIGINAL ARTICLES原创文章

Development and Validation of a Multimodal Multitask Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

J. Qiu and Others

Abstract

BACKGROUND

Specialized single-use, single-modality models often have limited or no generalization to new diseases, modalities, and clinical tasks. Foundation models are built for multipurpose use, enabling them to perform tasks even when not specifically pretrained for them, and to adapt to different clinical applications.

METHODS

We present VisionFM, an artificial intelligence foundation model for ophthalmology pretrained on 3.4 million images from over 500,000 individuals, covering diverse ophthalmic diseases, imaging modalities and devices, and clinical scenarios. Pretrained based on eight modalities, VisionFM was tested for multiple applications, including disease screening and detection, prognosis and prediction, and segmentation of lesions and anatomical structures, on an ophthalmic database comprising 53 public and 12 private datasets. We compared the model against ophthalmologists with varying experience in ophthalmic and systemic disease diagnoses.

RESULTS

VisionFM outperformed baseline deep learning approaches in diagnosing ocular diseases, achieving an average area under the receiver operating characteristic curve (AUROC) of 0.950 (95% confidence interval [CI], 0.941 to 0.959) across eight disease categories and five imaging modalities in internal validation. In external validation, VisionFM achieved an AUROC of 0.945 (95% CI, 0.934 to 0.956) in fundus-based diabetic retinopathy recognition, and an AUROC of 0.974 (95% CI, 0.966 to 0.983) in optical coherence tomography–based age-related macular degeneration recognition. In a comparative study of diagnostic accuracy of 12 ocular diseases from fundus photographs, VisionFM shows diagnostic accuracy close to that of intermediate-level ophthalmologists. Its generalizability extends to new imaging modalities and devices, effectively handling dataset shifts. For example, VisionFM accurately graded diabetic retinopathy with an AUROC of 0.935 (95% CI, 0.902 to 0.964) using an imaging modality it was never exposed to during pretraining. Furthermore, VisionFM is able to predict both glaucoma progression and the presence of intracranial tumors directly from fundus photographs.

CONCLUSIONS

VisionFM provides an efficient platform for diagnosis or prediction of multiple diseases using multiple imaging modalities and is scalable to incorporate additional data, modalities, and applications via its open-sourced model weights and codebase. (Funded by the Research Grants Council (RGC) of Hong Kong SAR and others.)

DOI: 10.1056/AIoa2300221

全文链接：https://ai.nejm.org/doi/abs/10.1056/AIoa2300221

开发和验证多模态多任务视觉基础模型用于通用眼科人工智能

J. Qiu 等人

摘要：背景：专门的单一用途、单一模态模型通常对新疾病、模态和临床任务的泛化能力有限或没有泛化能力。基础模型是为了多用途而构建的，使它们即使在没有专门为它们预训练的情况下也能执行任务，并适应不同的临床应用。方法：我们提出了VisionFM，这是一个眼科人工智能基础模型，预训练了来自超过50万人的3.4百万图像，涵盖了多样的眼科疾病、成像模态和设备以及临床场景。VisionFM基于八种模态进行预训练，并在包括53个公共和12个私有数据集的眼科数据库上进行了多种应用的测试，包括疾病筛查和检测、预后和预测以及病变和解剖结构的分割，与不同经验的眼科和系统疾病诊断的眼科医生进行了比较。结果： VisionFM在诊断眼部疾病方面超越了基线深度学习方法，在内部验证中跨八个疾病类别和五种成像模态实现了平均接收者操作特征曲线（AUROC）0.950（95%置信区间[CI]，0.941至0.959）。在外部验证中，VisionFM在基于眼底的糖尿病视网膜病变识别中实现了AUROC 0.945（95% CI，0.934至0.956），在基于光学相干断层扫描的年龄相关性黄斑变性识别中实现了AUROC 0.974（95% CI，0.966至0.983）。在比较基于眼底照片的12种眼病的诊断准确性的研究中，VisionFM显示出与中级眼科医生相近的诊断准确性。其泛化能力扩展到新的成像模态和设备，有效地处理数据集的转移。例如，VisionFM在使用它在预训练期间从未接触过的成像模态准确分级糖尿病视网膜病变，AUROC为0.935（95% CI，0.902至0.964）。此外，VisionFM能够直接从眼底照片中预测青光眼进展和颅内肿瘤的存在。结论： VisionFM提供了一个高效的平台，用于使用多种成像模态诊断或预测多种疾病，并且可以通过其开源的模型权重和代码库扩展以纳入额外的数据、模态和应用。（由香港特别行政区研究资助局（RGC）和其他人资助。）

NEJM AI, Volume 1 No. 12 December 2024

译文来自于AI工具Kimi