FROM DISCHARGE LETTERS TO STRUCTURED CARDIOVASCULAR DATA: DEVELOPMENT AND FEASIBILITY OF AN INTERPRETABLE NLP PIPELINE

Galeazzi Michele Ancona (Ancona) – Cardiac Surgery Unit, Lancisi Cardiovascular Center, Polytechnic University Of Marche, Ancona, Italia | Spagnolo Francesca Ancona (Ancona) – Cardiac Surgery Unit, Lancisi Cardiovascular Center, Polytechnic University Of Marche, Ancona, Italia | Giusti Martina Ancona (Ancona) – Cardiac Surgery Unit, Lancisi Cardiovascular Center, Polytechnic University Of Marche, Ancona, Italia | Mali Erlil Ancona (Ancona) – Cardiac Surgery Unit, Lancisi Cardiovascular Center, Polytechnic University Of Marche, Ancona, Italia | Malvindi Pietro Giorgio Ancona (Ancona) – Cardiac Surgery Unit, Lancisi Cardiovascular Center, Polytechnic University Of Marche, Ancona, Italia | Berretta Paolo Ancona (Ancona) – Cardiac Surgery Unit, Lancisi Cardiovascular Center, Polytechnic University Of Marche, Ancona, Italia | Pierri Michele Danilo Ancona (Ancona) – Cardiac Surgery Unit, Lancisi Cardiovascular Center, Polytechnic University Of Marche, Ancona, Italia | Di Eusanio Marco Ancona (Ancona) – Cardiac Surgery Unit, Lancisi Cardiovascular Center, Polytechnic University Of Marche, Ancona, Italia

CARDIOLOGIA DIGITALE – INTELLIGENZA ARTIFICIALE

Background In cardiovascular research and quality assessment, structured clinical data is often limited. A great deal of clinically relevant material remains embedded in unstructured discharge summaries. This is an important barrier to performing real-world outcome analysis, especially given the evaluation of procedural and postoperative complications. Aim To describe its development and feasibility of a natural language processing (NLP) pipeline designed to derive clinically meaningful postoperative information from cardiovascular discharge letters. Methods We proposed a modular NLP pipeline comprising rule-based pattern recognition using concept-level retrieval method. The system was tailored to discharge summaries of coronary artery bypass grafting (CABG) patients and applied to 2,569 CABG discharge summaries retrieved from the publicly available MIMIC database. The pipeline captures the postoperative trajectory with clinically relevant textual anchors and extracts predefined complication-related events. Development addressed heterogeneous narrative structures, variable section headers, and ambiguous temporal references. Transformer-based language models (BERT) supported identification of complication-related information within postoperative sections. A systematic approach is made for interpretability, traceability and reproducibility instead of black-box predictions Results Using anchor-based identification alone, 59% of discharge summaries were classified as containing postoperative complications. Subsequent characterization of complication-related content highlighted variable degrees of explicitness and temporal clarity across different event types. Clearly documented postoperative complications were consistently captured, whereas implicit or temporally ambiguous descriptions required additional contextual interpretation or manual review. Qualitative expert assessment confirmed clinically coherent outputs and transparent rule-to-concept traceability, while also identifying recurrent sources of uncertainty inherent to free-text clinical documentation. Conclusions This study demonstrates that transparent NLP pipelines can effectively structure unstructured cardiovascular clinical data while explicitly exposing sources of ambiguity and limitation. Such approaches may support real-world research and quality assessment, provided that their constraints are clearly acknowledged and systematically addressed in subsequent validation phases.

CONGRESS ABSTRACT

CONGRESS ABSTRACT

FROM DISCHARGE LETTERS TO STRUCTURED CARDIOVASCULAR DATA: DEVELOPMENT AND FEASIBILITY OF AN INTERPRETABLE NLP PIPELINE