(1)

Wirawan, R. Benchmarking Large Language Models on Diagnostic Inference Tasks in Medical Texts. ATCAEP 2024, 14 (9), 15-31.