Skip Navigation
Skip to contents

JPMPH : Journal of Preventive Medicine and Public Health

OPEN ACCESS
SEARCH
Search

Articles

Page Path
HOME > J Prev Med Public Health > Volume 59(2); 2026 > Article
Brief Report
Missing Occupation, Missing Risk: Insights From COVID-19 Case Investigation Data in Busan, South Korea
Jin-Hwan Kim1*orcid, Daseul Moon2†orcid, Changhoon Kim2,3orcid
Journal of Preventive Medicine and Public Health 2026;59(2):204-210.
DOI: https://doi.org/10.3961/jpmph.25.718
Published online: March 12, 2026
  • 655 Views
  • 95 Download

1Institute of Health and Environment, Seoul National University, Seoul, Korea

2Busan Center for Infectious Disease Control & Prevention, Pusan National University Hospital, Busan, Korea

3Department of Preventive Medicine, Pusan National University School of Medicine, Busan, Korea

Corresponding author: Daseul Moon, Busan Center for Infectious Disease Control & Prevention, Pusan National University Hospital, 179 Gudeok-ro, Seo-gu, Busan 49241, Korea, E-mail: moon912390@gmail.com
* Current affiliation: Department of Preventive Medicine, Kyung Hee University College of Medicine, Seoul, Korea.
† Current affiliation: Department of Preventive Medicine, Inje University College of Medicine, Busan, Korea.
• Received: September 5, 2025   • Revised: October 9, 2025   • Accepted: October 31, 2025

Copyright © 2026 The Korean Society for Preventive Medicine

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

prev next
  • Objectives
    This study evaluated the quality and analytic utility of occupational data in coronavirus disease 2019 (COVID-19) case investigation records from Busan, South Korea, during the period of comprehensive surveillance in 2020–2021, when occupation was inconsistently integrated into routine case reporting despite its importance for infection-risk assessment.
  • Methods
    We analyzed 25 283 confirmed COVID-19 cases reported between February 21, 2020, and December 31, 2021. Occupational information was extracted from investigation forms, epidemiological reports, and electronic medical records. We assessed completeness, internal inconsistencies, and codability to the Korean Standard Classification of Occupations (KSCO), and examined temporal trends across pandemic phases. Descriptive statistics and manual reviews of free-text entries were conducted.
  • Results
    Occupational information was recorded for nearly all investigated cases in 2020–2021 (>99%), but entries were often vague (“unemployed,” “other”) or institutional (“school,” “hospital”), which limited their utility. A minority of entries could be standardized to KSCO 1–3-digit categories because of ambiguous wording or contradictions between occupation and workplace. Although data collection virtually ceased in 2022 and 2023 after individual-level investigations were discontinued, patterns in the 2020–2021 dataset already showed that design flaws in the occupation field reduced analytic value.
  • Conclusions
    Busan’s early COVID-19 surveillance system recorded occupation for nearly all cases but produced limited analyzable information. The disconnect between data entry and analytic usefulness highlights the need for structured, dual occupation–industry coding, searchable picklists, and real-time quality checks so that occupational risk can be systematically identified and incorporated into future pandemic preparedness and response.
During the coronavirus disease 2019 (COVID-19) pandemic, infection risk was unevenly distributed across populations, often shaped by occupational conditions. From call centers in Seoul [1] to meatpacking plants in the United States [2], workplace clusters demonstrated how job-related environments mediated exposure. Workers in undervalued or insufficiently protected sectors experienced disproportionately high risks [3].
South Korea’s epidemiological investigation system, strengthened after the 2015 MERS (Middle East Respiratory Syndrome) outbreak, played a central role in early COVID-19 containment. The Epidemiological Investigation Support System integrated location data, credit card usage, and CCTV footage to support rapid contact tracing [4,5] (Supplemental Material 1). However, the system retained a critical limitation: occupation was not systematically incorporated into surveillance workflows.
Although the Korea Disease Control and Prevention Agency (KDCA) included an occupation field in investigation forms, the field was optional, unstructured, and lacked a standardized coding protocol (Supplemental Material 2). As a result, entries frequently consisted of vague descriptors (“무직” [unemployed], “기타” [other]) or institutional labels (“학교” [school] or “병원” [hospital]), which provided little analytic value. In public health surveillance, such unstructured fields are well known to degrade the quality of occupation and industry data [6]. Unlike routinely collected demographic variables such as age and sex, occupational information required manual interpretation, and data quality varied widely across investigators. The only partially structured exception involved healthcare workers, who were prompted to specify job roles more precisely.
This omission is not trivial. Occupation captures exposure dimensions such as physical proximity [7,8], ventilation characteristics [9], and the feasibility of remote work [10]. A prior modeling study estimated that 12.2 million Korean workers, about 24% of the population, faced high exposure risk [11], identifying 30 high-risk occupations, including medical and welfare-related services, security services, household helpers, cooking attendants, and elementary sales occupations. However, these models did not rely on confirmed case data, limiting their empirical grounding.
The consequences of these limitations were evident in Busan, a major logistics hub. In July 2020, an outbreak with no clear initial source emerged. Subsequent epidemiological investigations traced infections to ship repair engineers who boarded foreign vessels despite national quarantine protocols. Although these workers had not disembarked abroad, they contracted the virus in high-risk environments that were not captured by the surveillance system [12]. The outbreak later spread from the workplace into the surrounding community, although on a relatively small scale.
Without structured occupational data, public health agencies were unable to systematically identify transmission patterns or validate modeled exposure risk against real-world outcomes. Agencies therefore lacked the empirical basis needed to target high-risk groups or allocate resources effectively. In this context, the present study examines COVID-19 case investigation data from Busan (2020–2021), focusing on the structure, completeness, and analytic usability of occupational information. By assessing over two million records, we evaluate whether the available occupational data were sufficiently robust to inform pandemic policy and surveillance decision-making.
Study Setting and Data Sources
This study analyzed COVID-19 epidemiological investigation data collected by the Busan Metropolitan Government during the period of comprehensive surveillance from February 21, 2020, to August 30, 2023. The dataset comprised 2 091 976 individual-level records derived from three main sources: (1) case investigation forms, (2) initial epidemiological reports, and (3) electronic medical record extracts compiled for public health surveillance. All personally identifiable information was removed before analysis. Although an extended dataset covering all pandemic phases was available, analyses were restricted to 2020–2021 to minimize bias arising from systematic, phase-specific missingness attributable to deficiencies in later data collection protocols. From 2022 onward, during the Omicron-dominant phase, occupation recording completion rates fell sharply to 20.4% (346 072 of 1 695 744 cases) in 2022 and 12.6% (46 659 of 370 949 cases) in 2023, with 99.2% of all recorded entries limited to “unemployed,” “student,” or “children.” This pattern reflects a de facto cessation of meaningful occupation data collection, rendering post-2021 records unsuitable for outcome analyses. Nevertheless, we included descriptive analyses for all pandemic years (2020–2023) to characterize overall incidence trends in Busan.
Occupational Data Structure
Throughout the study period, occupation was captured through an open-ended, free-text field with no required format or standardized coding scheme. Although the KDCA provided general guidance—such as listing job titles, indicating school level for students, or identifying healthcare workers—data entry remained inconsistent. In contrast to tuberculosis investigation forms, which use predefined occupational checkboxes and structured templates, COVID-19 investigation forms relied almost entirely on manual text entry. Healthcare workers were the only group prompted to specify detailed roles (e.g., physician, nurse, technician), yet even these entries were applied unevenly across cases.
Analytical Approach
To evaluate the structure and quality of occupational data, we assembled an analytic dataset from the Busan COVID-19 Master Database, comprising initial investigation records from February 21, 2020, to December 31, 2021 (n=25 283). We examined four primary outcomes: (1) completeness, defined as the proportion of cases with any occupational entry; (2) content inconsistency, defined as the frequency and types of ambiguous or unclassifiable entries; (3) standardizability, defined as the proportion of entries that could be mapped to 1–3-digit Korean Standard Classification of Occupations (KSCO) categories; and (4) temporal trends in completeness and classifiability across major pandemic phases.
We conducted descriptive analyses for all outcomes and performed subgroup comparisons by time period and data source. A stratified manual review of free-text entries was used to identify common sources of ambiguity and to assess the feasibility of retrospective coding. In cases where both occupational descriptors and institutional identifiers were available, we evaluated the consistency between inferred job roles and corresponding industrial classifications. All statistical analyses were conducted using R version 4.5.0 (R Foundation for Statistical Computing, Vienna, Austria), and occupational terminology was cross-referenced against KSCO coding manuals.
Ethics Statement
This study was approved by the Institutional Review Board of Pusan National University Hospital (IRB No. 2506-013-152) following ethical review of identifiable public health data.
Between February 2020 and December 31, 2021, a total of 25 283 confirmed COVID-19 cases were reported in Busan. Case numbers were low in 2020 (1919) and 2021 (23 364), but surged in 2022 with over 1.69 million cases during the Omicron wave. By the end of August 2023, an additional 370 949 cases had been reported (Table 1). Demographic patterns were broadly consistent. Most cases involved Korean nationals residing in Busan, with a balanced sex distribution. Infections were concentrated among adults aged 20–69, though older age groups accounted for a growing share in 2023. Since the incompleteness of data in 2022–2023 warrants mention, we retained these years in Table 1, noting that the proportion of missing occupation information increased sharply after the Omicron surge—from 0.0% in 2020, 0.5% in 2021, and 79.6% in 2022 to 87.4% in 2023.
As we described above, Figures 1 and 2 were restricted to 2020–2021. Figure 1 shows monthly trends in occupational recording and coding rates. Occupational data were nearly complete in 2020–2021, when individual-level investigations were feasible. Data collection largely ceased in 2022 due to the overwhelming case burden, and entries in 2022–2023 were predominantly for children/students (98.8% of 392 731, data not shown). While field completion remained consistently high across pandemic phases, the share of records that could be mapped to standard occupation codes was substantially lower. We also reported the proportion of entries having “unemployed” and “student,” representing economically inactive individuals without current occupations.
Figure 2 illustrates the distribution of KSCO classification levels, providing a breakdown of the overall coding rate shown in Figure 1. The majority of coded entries were classified only at the 1-digit level, providing limited information, and the proportion of records containing detailed occupational data was very low. Although the pattern does not perfectly coincide, the coding rate tended to decline when the relative administrative burden increased during local surges in case numbers, for example, in March 2020, October 2020, late 2020 to early 2021, and late 2021.
A frequent source of non-codability was the entry of institutional descriptors in place of job titles, for example, “초등학교” (elementary school), “요양원” (nursing home), or “외국인선원” (foreign seafarer), none of which correspond to an occupation without further interpretation. Classification inconsistencies also arose when the occupation and institution fields conflicted (e.g., “간호사” [nurse] linked to a school rather than a healthcare facility), yielding ambiguous industry–occupation assignments. These patterns reveal a structural limitation in Korea’s COVID-19 surveillance workflow: widespread collection of occupational text did not translate into usable, standardized codes for risk assessment. Without consistent KSCO coding, local health authorities were constrained in identifying workplace clusters, quantifying occupational vulnerability, and targeting protections to high-risk groups.
This study reveals a fundamental disconnect between occupational data collection and analytical utility in Busan’s COVID-19 surveillance system. Although occupation was recorded in over 2.09 million notifications, only a subset of entries could be mapped to KSCO at the 1–3-digit level. This mapping gap widened during surge periods when caseloads increased and data entry became decentralized (Figures 1 and 2), resulting in a surveillance system that documented infections but struggled to identify where, and under what circumstances, exposure occurred.
Two primary failure modes explain this analytical gap. First, the occupation field relied on optional free-text entry with minimal guidance, producing vague responses (“unemployed/other”) or institutional descriptors (“school,” “hospital”) instead of job titles. Second, when both occupation and workplace were recorded, contradictions frequently emerged—for example, “nurse” attached to a non-healthcare facility—creating ambiguity about whether the reported information reflected job tasks or environmental exposure. These problems intensified from early 2022, when individual-level investigations largely ceased under Omicron surge pressures. Occupation fields were rarely completed and disproportionately captured children or students, generating phase-specific missingness rather than random data loss. This pattern likely delayed recognition of workplace clusters. Importantly, the gap between completion and usability is neither inevitable nor unique to Korea; jurisdictions that integrate both occupational and industrial codes demonstrate that appropriate system design can close this gap. International experience shows that dual classification systems enable routine sector-specific monitoring, targeted interventions, and policy evaluation.
Three core reforms would convert nominal completeness into decision-grade intelligence. First, dual coding should be required at data entry, mandating both KSCO for occupation and Korea Standard Industry Code (KSIC) for industry, supported by simple cross-field validation. Because occupation reflects job tasks and industry reflects the work environment, collecting both reduces contradictions and enables routine data quality checks. For example, a “nurse” working in a “manufacturing plant” versus a “hospital” entails markedly different exposure contexts despite identical job titles, and automated validation can flag implausible combinations such as “teacher” plus “mining industry” for manual review. Second, free-text fields should be replaced with searchable pick-lists featuring auto-complete functionality, multilingual synonym dictionaries for Korean and English variants, and lightweight validation to detect unlikely occupation–industry pairs. The interface should also include dedicated options for students, homemakers, and unemployed individuals, who were frequently misclassified in the free-text system. This approach mirrors successful implementations in Wisconsin [13] and Toronto [14,15], where user-friendly interfaces substantially improved data quality without increasing entry time. Third, real-time quality assurance should be supported through dashboards tracking codability rates, missing data, and error patterns by district and time period. Automated alerts can flag unusual patterns for immediate review, while feedback loops provide targeted training to field teams based on recurrent errors. Routine 1–2% random audits would help maintain accuracy during high-pressure surge periods when shortcuts are more likely. These reforms would facilitate sector-specific surveillance, targeted protections (e.g., personal protective equipment, ventilation upgrades, vaccination outreach, paid sick leave, hazard pay), and evaluation of interventions—capabilities essential for equitable infectious disease control. Making occupational risk visible, measurable, and governable aligns with international occupational health surveillance standards (International Labor Organization) [16] and emergency preparedness frameworks (World Health Organization) [17].
This analysis provides the first systematic, citywide assessment of KSCO codability in Korean COVID-19 surveillance, covering all pandemic phases at census scale. The methodological approach distinguishes recording, coding, and population-level metrics while restricting key analyses to high-ascertainment periods to reduce bias. Limitations include the single-city context, reliance on routine fields without interview verification, absence of direct linkage to workplace outbreaks, residual misclassification despite manual review, and a descriptive design that cannot quantify causal relationships between occupational categories and infection risk. Even so, the findings remain robust and directly relevant for policy.
Korea’s COVID-19 surveillance captured extensive occupational text, but insufficient standardized codes for analysis. This limitation reflects a broader failure to incorporate occupational risk factors into routine risk assessment and high-risk group identification for infectious disease response. Efforts to systematize occupational data have progressed since 2024: the KDCA has integrated previously fragmented epidemiological investigation systems and replaced manual text entry with searchable KSCO codes that now support detailed occupational coding at the 3-digit level, improving data reliability. Implementing KSCO/KSIC dual coding with searchable interfaces, automated validation, and real-time quality assurance—while retrospectively enhancing legacy data—would substantially strengthen surveillance capacity. These reforms, implemented now and sustained beyond emergencies, would enhance pandemic preparedness and advance health equity by enabling timely identification and protection of high-risk workforces.
Supplemental materials are available at https://doi.org/10.3961/jpmph.25.718.

Conflict of Interest

The authors have no conflicts of interest associated with the material presented in this paper.

Funding

This study was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (grant No. RS-2023-00271195).

Acknowledgements

This study was developed based on an issue paper published by the Busan Center for Infectious Disease Control & Prevention.

Author Contributions

Conceptualization: Kim JH, Moon D. Data curation: Kim JH. Formal analysis: Kim JH. Funding acquisition: Kim JH. Methodology: Kim JH. Project administration: Moon D. Visualization: Kim JH. Writing – original draft: Kim JH, Moon D. Writing – review & editing: Kim JH, Moon D, Kim C.

Figure 1
Monthly completeness, content consistency, and proportion of student/unemployed entries in recorded occupation data with monthly case counts, 2020–2021. Lines: left axis (%); Bars: right axis (cases).
jpmph-25-718f1.jpg
Figure 2
Monthly proportion of occupation entries classifiable using the Korean Standard Classification of Occupations at major (1-digit), intermediate (2-digit), and minor (3-digit) levels with case counts, 2020–2021. Lines: left axis (%), Bars: right axis (cases).
jpmph-25-718f2.jpg
Table 1
Characteristics of confirmed COVID-19 cases in Busan
Characteristics Total 2020 2021 2022 20231
Total 2 091 976 (100) 1919 (100) 23 364 (100) 1 695 744 (100) 370 949 (100)
Sex Male 914 252 (43.7) 835 (43.5) 11 550 (49.4) 748 651 (44.1) 153 216 (41.3)
Female 1 177 724 (56.3) 1084 (56.5) 11 813 (50.6) 947 093 (55.9) 217 733 (58.7)
Age (y) <10 181 302 (8.7) 64 (3.3) 1877 (8.0) 163 081 (9.6) 16 280 (4.4)
10s 236 965 (11.3) 132 (6.9) 2273 (9.7) 200 308 (11.8) 34 252 (9.2)
20s 285 460 (13.6) 204 (10.6) 3109 (13.3) 237 557 (14.0) 44 590 (12.0)
30s 275 897 (13.2) 186 (9.7) 2941 (12.6) 224 526 (13.2) 48 244 (13.0)
40s 297 405 (14.2) 196 (10.2) 3171 (13.6) 244 228 (14.4) 49 810 (13.4)
50s 268 886 (12.9) 371 (19.3) 3060 (13.1) 215 936 (12.7) 49 519 (13.3)
60s 288 394 (13.8) 357 (18.6) 4116 (17.6) 221 485 (13.1) 62 436 (16.8)
70s 168 737 (8.1) 204 (10.6) 1842 (7.9) 123 942 (7.3) 42 749 (11.5)
≥80 88 930 (4.3) 205 (10.7) 975 (4.2) 64 681 (3.8) 23 069 (6.2)
Nationality Korean 2 069 800 (98.9) 1897 (98.9) 22 708 (97.2) 1 676 970 (98.9) 368 225 (99.3)
Other (incl. missing) 22 176 (1.1) 22 (1.1) 656 (2.8) 18 774 (1.1) 2724 (0.7)
Region (based on residence) Busan 1 990 822 (95.2) 1622 (84.5) 22 969 (98.3) 1 607 425 (94.8) 358 806 (96.7)
Other 101 154 (4.8) 297 (15.5) 395 (1.7) 88 319 (5.2) 12 143 (3.3)

Occupation Recorded 417 903 (20.0) 1919 (100) 23 253 (99.5) 346 072 (20.4) 46 659 (12.6)
Missing 1 674 073 (80.0) 0 (0) 111 (0.5) 1 349 672 (79.6) 324 290 (87.4)

Values are presented as number (%).

COVID-19, coronavirus disease 2019; incl., including.

1 2023 figures represent values up to August 30.

Figure & Data

References

    Citations

    Citations to this article as recorded by  

      • PubReader PubReader
      • Cite
        CITE
        export Copy
        Close
      • XML DownloadXML Download
      Figure
      • 0
      • 1
      Missing Occupation, Missing Risk: Insights From COVID-19 Case Investigation Data in Busan, South Korea
      Image Image
      Figure 1 Monthly completeness, content consistency, and proportion of student/unemployed entries in recorded occupation data with monthly case counts, 2020–2021. Lines: left axis (%); Bars: right axis (cases).
      Figure 2 Monthly proportion of occupation entries classifiable using the Korean Standard Classification of Occupations at major (1-digit), intermediate (2-digit), and minor (3-digit) levels with case counts, 2020–2021. Lines: left axis (%), Bars: right axis (cases).
      Missing Occupation, Missing Risk: Insights From COVID-19 Case Investigation Data in Busan, South Korea
      Characteristics Total 2020 2021 2022 20231
      Total 2 091 976 (100) 1919 (100) 23 364 (100) 1 695 744 (100) 370 949 (100)
      Sex Male 914 252 (43.7) 835 (43.5) 11 550 (49.4) 748 651 (44.1) 153 216 (41.3)
      Female 1 177 724 (56.3) 1084 (56.5) 11 813 (50.6) 947 093 (55.9) 217 733 (58.7)
      Age (y) <10 181 302 (8.7) 64 (3.3) 1877 (8.0) 163 081 (9.6) 16 280 (4.4)
      10s 236 965 (11.3) 132 (6.9) 2273 (9.7) 200 308 (11.8) 34 252 (9.2)
      20s 285 460 (13.6) 204 (10.6) 3109 (13.3) 237 557 (14.0) 44 590 (12.0)
      30s 275 897 (13.2) 186 (9.7) 2941 (12.6) 224 526 (13.2) 48 244 (13.0)
      40s 297 405 (14.2) 196 (10.2) 3171 (13.6) 244 228 (14.4) 49 810 (13.4)
      50s 268 886 (12.9) 371 (19.3) 3060 (13.1) 215 936 (12.7) 49 519 (13.3)
      60s 288 394 (13.8) 357 (18.6) 4116 (17.6) 221 485 (13.1) 62 436 (16.8)
      70s 168 737 (8.1) 204 (10.6) 1842 (7.9) 123 942 (7.3) 42 749 (11.5)
      ≥80 88 930 (4.3) 205 (10.7) 975 (4.2) 64 681 (3.8) 23 069 (6.2)
      Nationality Korean 2 069 800 (98.9) 1897 (98.9) 22 708 (97.2) 1 676 970 (98.9) 368 225 (99.3)
      Other (incl. missing) 22 176 (1.1) 22 (1.1) 656 (2.8) 18 774 (1.1) 2724 (0.7)
      Region (based on residence) Busan 1 990 822 (95.2) 1622 (84.5) 22 969 (98.3) 1 607 425 (94.8) 358 806 (96.7)
      Other 101 154 (4.8) 297 (15.5) 395 (1.7) 88 319 (5.2) 12 143 (3.3)

      Occupation Recorded 417 903 (20.0) 1919 (100) 23 253 (99.5) 346 072 (20.4) 46 659 (12.6)
      Missing 1 674 073 (80.0) 0 (0) 111 (0.5) 1 349 672 (79.6) 324 290 (87.4)
      Table 1 Characteristics of confirmed COVID-19 cases in Busan

      Values are presented as number (%).

      COVID-19, coronavirus disease 2019; incl., including.

      2023 figures represent values up to August 30.


      JPMPH : Journal of Preventive Medicine and Public Health
      TOP