In Korea, the road to becoming a doctor with a specialty normally takes eleven years. It consists of six years of education in a medical college and one year of internship in a training hospital, usually one of the university hospitals, followed by four years of training in a specific medical specialty, such as internal medicine. If a doctor with a specialty decides to remain in the university hospital, they will move up the hospital ladder in phases over a few decades according to the following stages: fellow, assistant professor, associated professor, and (full) professor.
Perhaps the lives of medical students and medical interns are not far away from what the general population can imagine. However, residents and fellows' lives are quite different according to their respective specialty, so residents and fellows from each department don't know clearly what their friends do in other departments.
However, there is one thing that almost every resident and fellow does regardless of their specialty, data cleaning, also called data cleansing. As writing journal articles is a common requirement, residents and fellows have to extract information from EMR (Electronic Medical Record), de-identify, clean, and analyze the data.
The final goal of the research is to obtain useful knowledge that can make a change, for example, the improvement in treatment strategies or accuracy of diagnostic tools. In that sense, it should be fulfilling to gather, preprocess, and analyze the data. I'm a fellow in a university hospital. In reality, when I'm talking to a doctor friend, we are likely to mock one another over the never-ending overtime while dreaming of transforming dirty data into analyzable data. In other words, when we are preparing for a medical research article, we normally spend most of the time copying and pasting records, filling missing values, correcting typos, fixing errors in data files. No matter what the final results are, whether it is being published in a great journal or not, we feel overwhelmed and frustrated, collecting and cleaning the dataset of several thousand cases. We understand it is a necessary job, but it feels like a chore.
Second, people in charge often have an incomplete understanding of directional and purposeful data aggregation. From my experience, data collection is mainly carried out by a professor's research assistant, and a fellow or resident receives the data and completes preprocessing before analysis. However, many of them, including me, didn't know how to clean or work with the data. Assistants are often graduates who don't necessarily have a degree in something related to medicine or statistics, and doctors often don't have the required knowledge and data handling skills for particular software environments. As a result, the whole process is inefficient.
Third, everybody's business is nobody's business. Sometimes people who don't have an interest in the task take turns collecting data from EMR. For example, a few residents may work for a professor for free (none of them is the author of the article), and they lack the motivation to complete the task thoroughly. I haven't been in such a situation, but I saw many doctor friends complaining about it. In this situation, a lot of human errors are bound to happen, which undermines the integrity of data.
As I'm a fellow, I can't help but give deeper thought to this ongoing battle with messy data. I believe we will be able to save time in collecting and cleaning data if my hospital offers practical educational sessions to improve data handling skills, such as a hands-on session. (I remember many educational sessions that ended with abstract ideas and motivational speeches.) Also, every contributor to a research project should be rewarded accordingly to keep them motivated and reduce manual errors.
professor / prə-ˈfe-sər
process / ˈprä-ˌses, ˈprō-, -səs
aggregate / ˈa-gri-gət / 어그리게이션 아니고 애그리것
project / ˈprä-ˌjekt, -jikt also ˈprō-
cleanse / ˈklenz
Perhaps the lives of medical students and medical interns are not far away from what the general population can imagine. However, residents and fellows' lives are quite different according to their respective specialty, so residents and fellows from each department don't know clearly what their friends do in other departments.
However, there is one thing that almost every resident and fellow does regardless of their specialty, data cleaning, also called data cleansing. As writing journal articles is a common requirement, residents and fellows have to extract information from EMR (Electronic Medical Record), de-identify, clean, and analyze the data.
Why is collecting and cleaning medical data a bothersome and physically demanding chore?
First, medical records are mostly unstructured data by nature. Clinical notes, surgical records, discharge records, radiology reports, and pathology reports contain narrative data. They lack a common structural framework, and the doctor's writing style and practice style also affect the data. Sometimes data attributes may be lost due to an EMR system failure. These features increase the complexity of data preprocessing.Second, people in charge often have an incomplete understanding of directional and purposeful data aggregation. From my experience, data collection is mainly carried out by a professor's research assistant, and a fellow or resident receives the data and completes preprocessing before analysis. However, many of them, including me, didn't know how to clean or work with the data. Assistants are often graduates who don't necessarily have a degree in something related to medicine or statistics, and doctors often don't have the required knowledge and data handling skills for particular software environments. As a result, the whole process is inefficient.
Third, everybody's business is nobody's business. Sometimes people who don't have an interest in the task take turns collecting data from EMR. For example, a few residents may work for a professor for free (none of them is the author of the article), and they lack the motivation to complete the task thoroughly. I haven't been in such a situation, but I saw many doctor friends complaining about it. In this situation, a lot of human errors are bound to happen, which undermines the integrity of data.
As I'm a fellow, I can't help but give deeper thought to this ongoing battle with messy data. I believe we will be able to save time in collecting and cleaning data if my hospital offers practical educational sessions to improve data handling skills, such as a hands-on session. (I remember many educational sessions that ended with abstract ideas and motivational speeches.) Also, every contributor to a research project should be rewarded accordingly to keep them motivated and reduce manual errors.
professor / prə-ˈfe-sər
process / ˈprä-ˌses, ˈprō-, -səs
aggregate / ˈa-gri-gət / 어그리게이션 아니고 애그리것
project / ˈprä-ˌjekt, -jikt also ˈprō-
cleanse / ˈklenz
Comments
Post a Comment