Background: Conversational large language models (LLMs) are rapidly entering the medical domain, raising interest in their potential role as information-support tools in cardiology. However, objective, head-to-head evaluations of different LLMs in high-risk cardiovascular scenarios remain limited, particularly with respect to clinical accuracy and contextual adequacy. Aim: To benchmark the…