TY - GEN
T1 - Unlocking the Potential of Electronic Health Records With Danish Clinical Language Models for Text Mining
AU - Pedersen, Jannik Skyttegaard
PY - 2023/11/2
Y1 - 2023/11/2
N2 - This PhD dissertation focuses on the development of language technology that can be used to extract clinical information from Danish electronic health records (EHRs). EHRs contain important health-related information that can be used to guide the treatment of patients. However, a large part of the information is stored in unstructured narrative text of the EHR, making it difficult and time-consuming to extract the relevant details, especially in acute situations. Consequently, important information may be lost which can increase the risk of misdiagnosis and adverse treatment outcomes. The recent paradigm shift in the field of natural language processing (NLP), driven by self-supervised neural networks and the transformer architecture, has produced automatic text-processing tools with unprecedented performances. These tools could be used to extract and structure the information from the narrative text of EHRs automatically. However, research in language technology has mostly been explored for high-resource languages like English, while the development of Danish language technology has received less attention, especially for specialized domains such as the clinical.This dissertation explores the potential of language technology to automatically extract information from the narrative text of Danish EHRs. Moreover, it emphasizes the importance of developing language resources tailored for the Danish clinical domain, as it can be used to enhance clinical research possibilities and improve patient treatment.The dissertation covers the development of two Danish pre-trained language models which show improved performance compared to existing Danish language models. Moreover, it explores the impact of dataset curation on potential biases in clinical language models. The dissertation also investigates how language models can be used to extract bleeding events from Danish EHRs and evaluates the performance of medical doctors in identifying relevant information when using the bleeding algorithm as an assistive tool. Finally, the dissertation presents a pre-trained language model that can be used to extract clinical information such as diseases, symptoms, and treatments in the narrative text of Danish EHRs.
AB - This PhD dissertation focuses on the development of language technology that can be used to extract clinical information from Danish electronic health records (EHRs). EHRs contain important health-related information that can be used to guide the treatment of patients. However, a large part of the information is stored in unstructured narrative text of the EHR, making it difficult and time-consuming to extract the relevant details, especially in acute situations. Consequently, important information may be lost which can increase the risk of misdiagnosis and adverse treatment outcomes. The recent paradigm shift in the field of natural language processing (NLP), driven by self-supervised neural networks and the transformer architecture, has produced automatic text-processing tools with unprecedented performances. These tools could be used to extract and structure the information from the narrative text of EHRs automatically. However, research in language technology has mostly been explored for high-resource languages like English, while the development of Danish language technology has received less attention, especially for specialized domains such as the clinical.This dissertation explores the potential of language technology to automatically extract information from the narrative text of Danish EHRs. Moreover, it emphasizes the importance of developing language resources tailored for the Danish clinical domain, as it can be used to enhance clinical research possibilities and improve patient treatment.The dissertation covers the development of two Danish pre-trained language models which show improved performance compared to existing Danish language models. Moreover, it explores the impact of dataset curation on potential biases in clinical language models. The dissertation also investigates how language models can be used to extract bleeding events from Danish EHRs and evaluates the performance of medical doctors in identifying relevant information when using the bleeding algorithm as an assistive tool. Finally, the dissertation presents a pre-trained language model that can be used to extract clinical information such as diseases, symptoms, and treatments in the narrative text of Danish EHRs.
U2 - 10.21996/e9cj-9c53
DO - 10.21996/e9cj-9c53
M3 - Ph.D. thesis
PB - Syddansk Universitet. Det Tekniske Fakultet
ER -