Unlocking the Potential of Electronic Health Records With Danish Clinical Language Models for Text Mining

Jannik Skyttegaard Pedersen

doi:10.21996/e9cj-9c53

Unlocking the Potential of Electronic Health Records With Danish Clinical Language Models for Text Mining

Bidragets oversatte titel: Frigørelse af potentialet for Elektroniske Patientjournaler med Danske Kliniske Sprogmodeller til Tekstmining

Jannik Skyttegaard Pedersen

Publikation: Afhandling › Ph.d.-afhandling

158 Downloads (Pure)

Abstract

Denne afhandling fokuserer på udvikling af sprogteknologi til udtræk af klinisk information fra danske elektroniske patientjournaler. Elektroniske patientjournaler indeholder vigtig sundhedsrelateret information, som kan bruges til at guide behandlingen af patienter. En stor del af informationen i patientjournalen er dog beskrevet i ustruktureret tekst, hvilket gør det vanskeligt og tidskrævende at udtrække relevante detaljer, især i akutte situationer. Som følge heraf kan vigtig information gå tabt, hvilket kan øge risikoen for fejldiagnosticering og forringede behandlingsresultater.

Det nylige paradigmeskifte inden for natural language processing, drevet af self-supervised neurale netværk og transformerarkitekturen, har produceret automatiske tekstbehandlingsværktøjer med hidtil uset præcision. Disse værktøjer kan bruges til at udtrække og strukturere informationen fra den ustrukturerede tekst i den elektroniske patientjournal automatisk. Forskning indenfor sprogteknologi er dog mest blevet udforsket for ressourcestærke sprog såsom engelsk, mens udviklingen indenfor dansk sprogteknologi har været mere stillestående, især for specialiserede domæner såsom det kliniske.

Denne afhandling undersøger potentialet for sprogteknologi til automatisk at udtrække information fra den ustrukturerede del af den elektroniske patientjournal. Derudover beskriver afhandlingen vigtigheden af at udvikle sproglige ressourcer specifikt til det danske kliniske domæne, da det kan bruges til at forbedre behandlingen af patienter samt give nye kliniske forskningsuligheder.

Afhandlingen beskriver udviklingen af to danske præ-trænede sprogmodeller, som viser forbedret præcision sammenlignet med eksisterende danske sprogmodeller. Desuden udforskes det hvordan data curation kan påvirke bias i kliniske sprogmodeller. Afhandlingen undersøger også, hvordan sprogmodeller kan bruges til at udtrække information omkring blødning fra danske elektroniske patientjournaler, og evaluerer lægers evne til at udtrække relevante informationer med blødningsalgoritmen som hjælpeværktøj. Dernæst præsenterer afhandlingen en præ-trænet sprogmodel, som kan bruges til at udtrække kliniske informationer såsom sygdomme, symptomer og behandlinger i den ustrukturerede tekst i danske elektroniske patientjournaler.

Bidragets oversatte titel	Frigørelse af potentialet for Elektroniske Patientjournaler med Danske Kliniske Sprogmodeller til Tekstmining
Originalsprog	Engelsk
Bevilgende institution	Syddansk Universitet
Vejledere/rådgivere	Savarimuthu, Thiusius R., Hovedvejleder Vinholt, Pernille Just, Bivejleder
Udgiver	Syddansk Universitet. Det Tekniske Fakultet
DOI	https://doi.org/10.21996/e9cj-9c53
Status	Udgivet - 2. nov. 2023

Note vedr. afhandling

Den fulde afhandling kan læses på SDUs bibliotek.

Dokumenter og links

10.21996/e9cj-9c53

Open Access Version (reduced)Forlagets udgivne version, 2,6 MB

4 Konferencebidrag i proceedings
2 Tidsskriftartikel

Danish Clinical Named Entity Recognition and Relation Extraction
Laursen, M. S., Pedersen, J. S., Hansen, R. S., Savarimuthu, T. R. & Vinholt, P. J., 2023, Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa). University of Tartu Library, s. 655–666
Publikation: Kapitel i bog/rapport/konference-proceeding › Konferencebidrag i proceedings › Forskning › peer review

Åben adgang
Fil
Doctors identify hemorrhage better during chart review when assisted by artificial intelligence
Laursen, M. S., Pedersen, J. S., Hansen, R. S., Savarimuthu, T. R., Lynggaard, R. B. & Vinholt, P., aug. 2023, I: Applied Clinical Informatics. 14, 4, s. 743-751
Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

Åben adgang
Fil
15 Downloads (Pure)
Investigating anatomical bias in clinical machine learning algorithms
Pedersen, J. S., Laursen, M. S., Vinholt, P. J., Alnor, A. B. & Savarimuthu, T. R., 2023, Findings of the Association for Computational Linguistics: EACL 2023. Association for Computational Linguistics (ACL), s. 1368-1380
Publikation: Kapitel i bog/rapport/konference-proceeding › Konferencebidrag i proceedings › Forskning › peer review

Åben adgang
Fil

Citationsformater

@misc{491593cd57294e2aad78a45838296d0f,

title = "Unlocking the Potential of Electronic Health Records With Danish Clinical Language Models for Text Mining",

abstract = "This PhD dissertation focuses on the development of language technology that can be used to extract clinical information from Danish electronic health records (EHRs). EHRs contain important health-related information that can be used to guide the treatment of patients. However, a large part of the information is stored in unstructured narrative text of the EHR, making it difficult and time-consuming to extract the relevant details, especially in acute situations. Consequently, important information may be lost which can increase the risk of misdiagnosis and adverse treatment outcomes. The recent paradigm shift in the field of natural language processing (NLP), driven by self-supervised neural networks and the transformer architecture, has produced automatic text-processing tools with unprecedented performances. These tools could be used to extract and structure the information from the narrative text of EHRs automatically. However, research in language technology has mostly been explored for high-resource languages like English, while the development of Danish language technology has received less attention, especially for specialized domains such as the clinical.This dissertation explores the potential of language technology to automatically extract information from the narrative text of Danish EHRs. Moreover, it emphasizes the importance of developing language resources tailored for the Danish clinical domain, as it can be used to enhance clinical research possibilities and improve patient treatment.The dissertation covers the development of two Danish pre-trained language models which show improved performance compared to existing Danish language models. Moreover, it explores the impact of dataset curation on potential biases in clinical language models. The dissertation also investigates how language models can be used to extract bleeding events from Danish EHRs and evaluates the performance of medical doctors in identifying relevant information when using the bleeding algorithm as an assistive tool. Finally, the dissertation presents a pre-trained language model that can be used to extract clinical information such as diseases, symptoms, and treatments in the narrative text of Danish EHRs. ",

author = "Pedersen, {Jannik Skyttegaard}",

year = "2023",

month = nov,

day = "2",

doi = "10.21996/e9cj-9c53",

language = "English",

publisher = "Syddansk Universitet. Det Tekniske Fakultet",

address = "Denmark",

school = "SDU",

}

TY - GEN

T1 - Unlocking the Potential of Electronic Health Records With Danish Clinical Language Models for Text Mining

AU - Pedersen, Jannik Skyttegaard

PY - 2023/11/2

Y1 - 2023/11/2

N2 - This PhD dissertation focuses on the development of language technology that can be used to extract clinical information from Danish electronic health records (EHRs). EHRs contain important health-related information that can be used to guide the treatment of patients. However, a large part of the information is stored in unstructured narrative text of the EHR, making it difficult and time-consuming to extract the relevant details, especially in acute situations. Consequently, important information may be lost which can increase the risk of misdiagnosis and adverse treatment outcomes. The recent paradigm shift in the field of natural language processing (NLP), driven by self-supervised neural networks and the transformer architecture, has produced automatic text-processing tools with unprecedented performances. These tools could be used to extract and structure the information from the narrative text of EHRs automatically. However, research in language technology has mostly been explored for high-resource languages like English, while the development of Danish language technology has received less attention, especially for specialized domains such as the clinical.This dissertation explores the potential of language technology to automatically extract information from the narrative text of Danish EHRs. Moreover, it emphasizes the importance of developing language resources tailored for the Danish clinical domain, as it can be used to enhance clinical research possibilities and improve patient treatment.The dissertation covers the development of two Danish pre-trained language models which show improved performance compared to existing Danish language models. Moreover, it explores the impact of dataset curation on potential biases in clinical language models. The dissertation also investigates how language models can be used to extract bleeding events from Danish EHRs and evaluates the performance of medical doctors in identifying relevant information when using the bleeding algorithm as an assistive tool. Finally, the dissertation presents a pre-trained language model that can be used to extract clinical information such as diseases, symptoms, and treatments in the narrative text of Danish EHRs.

AB - This PhD dissertation focuses on the development of language technology that can be used to extract clinical information from Danish electronic health records (EHRs). EHRs contain important health-related information that can be used to guide the treatment of patients. However, a large part of the information is stored in unstructured narrative text of the EHR, making it difficult and time-consuming to extract the relevant details, especially in acute situations. Consequently, important information may be lost which can increase the risk of misdiagnosis and adverse treatment outcomes. The recent paradigm shift in the field of natural language processing (NLP), driven by self-supervised neural networks and the transformer architecture, has produced automatic text-processing tools with unprecedented performances. These tools could be used to extract and structure the information from the narrative text of EHRs automatically. However, research in language technology has mostly been explored for high-resource languages like English, while the development of Danish language technology has received less attention, especially for specialized domains such as the clinical.This dissertation explores the potential of language technology to automatically extract information from the narrative text of Danish EHRs. Moreover, it emphasizes the importance of developing language resources tailored for the Danish clinical domain, as it can be used to enhance clinical research possibilities and improve patient treatment.The dissertation covers the development of two Danish pre-trained language models which show improved performance compared to existing Danish language models. Moreover, it explores the impact of dataset curation on potential biases in clinical language models. The dissertation also investigates how language models can be used to extract bleeding events from Danish EHRs and evaluates the performance of medical doctors in identifying relevant information when using the bleeding algorithm as an assistive tool. Finally, the dissertation presents a pre-trained language model that can be used to extract clinical information such as diseases, symptoms, and treatments in the narrative text of Danish EHRs.

U2 - 10.21996/e9cj-9c53

DO - 10.21996/e9cj-9c53

M3 - Ph.D. thesis

PB - Syddansk Universitet. Det Tekniske Fakultet

ER -

Unlocking the Potential of Electronic Health Records With Danish Clinical Language Models for Text Mining

Abstract

Note vedr. afhandling

Dokumenter og links

Fingeraftryk

Relaterede publikationer

Danish Clinical Named Entity Recognition and Relation Extraction

Doctors identify hemorrhage better during chart review when assisted by artificial intelligence

Investigating anatomical bias in clinical machine learning algorithms

Citationsformater