Original Article

Development and optimisation strategies for a nomogram-based predictive model of malignancy risk in thyroid nodules

Hong Kong Med J 2026 Feb;32(1):30–40 | Epub 30 Jan 2026

ORIGINAL ARTICLE (HEALTHCARE IN CHINA) CME

Development and optimisation strategies for a nomogram-based predictive model of malignancy risk in thyroid nodules

Peng He, MD, PhD^{1 #}; Yu Liang, MD^{2 #}; Yuan Zou, MD¹; Zhou Zou, BM³; Bo Ren, MD¹; Shan Peng, MD⁴; Hongmei Yuan, MD, PhD¹; Qin Chen, MD²

¹ Department of Ultrasound Medicine and Ultrasonic Medical Engineering Key Laboratory of Nanchong City, Affiliated Hospital of North Sichuan Medical College, Nanchong, China

² Department of Ultrasound, Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China

³ Department of Orthopedics, Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China

⁴ Department of Rehabilitation, Second Clinical College of North Sichuan Medical College, Nanchong, China

^# Equal contribution

Corresponding author: Dr Yuan Zou (zouyuanxiao@163.com)

Full paper in PDF

Abstract

Introduction: This study aimed to develop and validate a clinical prediction model to assist radiologists in optimising the diagnostic classification of the Chinese Thyroid Imaging Reporting and Data System (C-TIRADS).

Methods: A total of 1659 patients from two hospitals were included in this study. The derivation cohort comprised 909 patients for model development and internal validation, while 750 patients formed the external validation cohort. A binary logistic regression model was constructed. Model performance in the derivation set was evaluated using receiver operating characteristic (ROC) curves and visualised with a nomogram. In the external validation set, ROC and calibration curves were used to assess discrimination and calibration.

Results: The original C-TIRADS category, abnormal cervical lymph node sonographic findings, and changes in thyroid nodule size emerged as significant predictors of C-TIRADS optimisation. The optimised nomogram demonstrated an area under the ROC curve (AUC) of 0.730 (95% confidence interval=0.697-0.762), with a sensitivity of 63.2%, specificity of 74.9%, and overall accuracy of 67.7% for predicting optimisation. Using probability thresholds of ≥60% to recommend an upgrade and <30% to recommend a downgrade, the calibration curve showed good agreement, and decision curve analysis demonstrated a favourable net clinical benefit. External validation confirmed excellent discrimination (AUC=0.865; 95% confidence interval=0.839-0.891).

Conclusion: An optimised C-TIRADS model that integrates imaging features of thyroid nodules with clinical risk factors may aid radiologists in improving the diagnostic efficiency and clinical utility of the TIRADS classification.

New knowledge added by this study

This is the first study to integrate clinical risk factors with imaging features to optimise the Chinese Thyroid Imaging Reporting and Data System (C-TIRADS) classification.
This work established a risk threshold–based decision-making framework to guide C-TIRADS classification adjustments.
External validation demonstrated the model’s generalisability across diverse clinical settings.

Implications for clinical practice or policy

Our model improved diagnostic precision through the integration of imaging and clinical risk factors.
This research has the potential to optimise resource allocation and reduce interobserver diagnostic variability.

[Abstract in Chinese]

Introduction

Thyroid nodules are a common clinical finding, with a prevalence of approximately 4% to 7% in the general population, and are most often detected by ultrasonography.1 2 Although most thyroid nodules are benign, distinguishing malignant from benign nodules remains a clinical priority to avoid unnecessary procedures and ensure timely intervention.3 To standardise risk stratification, various Thyroid Imaging Reporting and Data Systems (TIRADS) have been developed,4 5 including the ACR-TIRADS (American College of Radiology),6 the K-TIRADS (Korean Society of Thyroid Radiology),7 and the European Thyroid Association.8 Recognising the need for a system tailored to the Chinese healthcare context, the Chinese Artificial Intelligence Alliance for Thyroid and Breast Ultrasound proposed the Chinese TIRADS (C-TIRADS) in 2021.2 However, existing TIRADS models primarily focus on sonographic characteristics and often overlook relevant clinical risk factors (eg, patient age, sex, and cervical lymph node [LN] involvement).9 In clinical practice, radiologists frequently incorporate such clinical information into their assessments, contributing to inconsistency and variability in TIRADS classification.

Papillary thyroid carcinoma accounts for approximately 80% to 90% of all thyroid cancers and is typically characterised by indolent behaviour.10 11 A substantial proportion of new cases involve papillary thyroid microcarcinoma, defined as tumours measuring less than 10 mm in diameter, which generally carry a favourable clinical prognosis.12 Increasing recognition of the indolent nature of papillary thyroid microcarcinoma has raised concerns regarding potential overdiagnosis and overtreatment. However, current risk stratification strategies that rely solely on imaging features may either overestimate or underestimate malignancy risk, depending on the patient’s broader clinical context. Approaches that incorporate clinical risk factors into TIRADS classification could address these limitations and enhance diagnostic accuracy, supporting more individualised patient management.

This study aimed to develop and externally validate a predictive model that integrates both imaging characteristics and clinical risk factors to refine the C-TIRADS classification system. To our knowledge, this is the first nomogram-based model to incorporate clinical risk factors into the C-TIRADS framework. The tool is designed to assist radiologists in improving diagnostic consistency and supporting more informed and individualised clinical decision making in the management of thyroid nodules.

Methods

Study design and population

This retrospective diagnostic study included patients with thyroid nodules who underwent surgical resection at two tertiary hospitals in China. The derivation cohort comprised patients treated at Sichuan Provincial People’s Hospital from January to December 2022, while the external validation cohort was drawn from Affiliated Hospital of North Sichuan Medical College during the same period. Inclusion criteria were: (1) thyroid nodules confirmed by postoperative pathology and (2) preoperative ultrasonography of the thyroid and cervical LNs with complete imaging and clinical records. Exclusion criteria were: (1) unclear pathological diagnosis; (2) incomplete clinical data; or (3) poor-quality ultrasound images.

Imaging evaluation and classification

Two junior radiologists, blinded to clinical and pathological information, independently classified all nodules according to the C-TIRADS criteria. Subsequently, two senior radiologists re-evaluated the cases and adjusted the classifications based on additional clinical risk factors, including patient demographics and cervical LN findings. Any modification from the initial C-TIRADS classification was defined as ‘classification optimisation’ (^*C-TIRADS), encompassing both upgrades and downgrades.

Data collection

Structured data collection forms were used to record clinical and sonographic variables. The collected data included patient sex, age, nodule size, number of nodules, C-TIRADS classification, and the presence of abnormal cervical LNs on ultrasonography.

Predictor variables

Sonographic features that directly determine the C-TIRADS score (such as solidity, echogenicity, aspect ratio, microcalcification, and margin irregularity) were not included independently in the multivariable analysis to avoid collinearity. Based on clinical relevance and univariate regression analysis, six predictors were selected for model development, namely, patient sex, age-group (≤40, 40-60, and >60 years),13 14 nodule size, number of nodules (single vs multiple), presence of abnormal cervical LNs, and original C-TIRADS classification.

Model development and internal validation

A binary logistic regression model was developed using the derivation cohort from Sichuan Provincial People’s Hospital (n=909). For categorical variables with more than two levels, dummy variables were created. The C-TIRADS category 5 was used as the reference group as it represents the highest level of suspicion and the most definitive management pathway (surgical resection), making it an appropriate clinical baseline to estimate relative malignancy risk and the need for reclassification. Model performance in the derivation cohort was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC), and calibration was assessed by comparing predicted probability (PP) with observed outcomes using calibration plots.

We emphasise that the primary outcome variable for model training was the pathological diagnosis (binary: malignant vs benign). The C-TIRADS optimisation, defined as upgrading or downgrading the original category based on PP thresholds, was a post-model clinical decision rule applied to the model output, not the outcome used for model development.

Internal validation was performed using bootstrap resampling with 1000 samples to obtain bias-corrected estimates of model performance and 95% confidence intervals (95% CIs). A fixed random seed was set to ensure reproducibility. The bias-corrected C-statistic was 0.728, compared with the original apparent performance of 0.730 (a difference of 0.002), confirming the model’s stable discriminative ability (online supplementary Table 1).

External validation

The final model was applied to the external cohort from Affiliated Hospital of North Sichuan Medical College (n=750) to evaluate its generalisability. Model discrimination was evaluated by calculating the AUC in the validation set, and calibration was assessed using calibration curves.

Nomogram construction

A nomogram was developed based on the final multivariable regression model to provide a visual tool for clinical application. Each predictor was assigned a score, and the total score corresponded to the PP of C-TIRADS classification optimisation.

Decision curve analysis and risk thresholds

Decision curve analysis and clinical impact curves were used to evaluate the clinical utility of the nomogram by quantifying the net benefit across a range of threshold probabilities. Specifically, the nomogram generates a PP indicating whether a nodule’s original C-TIRADS classification should be modified after integrating clinical information. For clinical decision making, we pre-specified probability cut-offs: PP ≥60% (upgrade), PP <30% (downgrade), and PP ≥30% but <60% (unchanged). Based on these thresholds, the model’s recommendations were translated into optimised C-TIRADS categories, which were then compared with radiologists’ optimisation decisions and surgical pathology findings, as appropriate. These thresholds are reported in the Results section and were applied consistently across all performance tables

Model performance evaluation

To ensure consistent ROC analysis, all AUCs were calculated using continuous PPs rather than ordinal risk categories. For the original C-TIRADS system, the five-level ordinal classification was transformed into a continuous malignancy probability score using proportional-odds (ordinal logistic) regression. This standard statistical method was employed to model the ordered nature of the C-TIRADS categories and to derive a continuous probability of malignancy for each category, enabling fair comparison in ROC analysis against other models. For the optimised ^*C-TIRADS system, PPs were directly obtained from the final multivariable logistic regression model. The ROC curves and corresponding AUCs were constructed using these continuous predictions.

Statistical analysis

Statistical analyses and data visualisation were performed using SPSS (Windows version 26.0; IBM Corp, Armonk [NY], United States) and RStudio (version 2022). Categorical variables were reported as number of cases or percentages, with group comparisons conducted using Chi squared test or Fisher’s exact test, as appropriate. Multivariable logistic regression analysis was conducted to identify independent predictors. Model discrimination was evaluated using ROC curves, while calibration curves were used to assess model accuracy. Clinical decision and impact curves were established to assess practical clinical utility. A two-tailed P value of <0.05 was considered statistically significant.

Results

Baseline characteristics

All models were trained to predict pathological malignancy. The optimised ^*C-TIRADS classifications presented here were derived by applying predefined probability thresholds to the model’s malignancy predictions.

A total of 1659 patients with thyroid nodules were included in the study, comprising 909 patients in the derivation cohort and 750 in the external validation cohort. In the derivation cohort, 71.8% of patients were women, and the majority (90.8%) had nodules measuring ≤30 mm. Approximately 81.7% of patients showed no abnormal cervical LNs on ultrasonography. The rate of C-TIRADS optimisation was 60.6%. In the external validation cohort, similar distributions were observed, with a higher proportion of nodules >30 mm (Table 1).

Table 1. Patient and nodule characteristics (n=1659)

Univariate analysis

Univariate binary regression analysis revealed that several variables were either significantly associated (P<0.05) or showed a trend towards association (0.05 < P < 0.1) with C-TIRADS optimisation. These variables included patient sex, age, nodule size (10-30 mm), number of nodules, solid composition, blurred margins, aspect ratio >1, abnormal cervical LNs, and C-TIRADS category (Table 2 and online supplementary Table 2).

Table 2. Predictor distribution and univariate logistic regression odds ratios for malignancy (n=909)

Multivariable model development

A multivariable binary logistic regression model was developed to identify independent predictors associated with C-TIRADS optimisation. Six predictors were independently associated with the outcome. The key predictors of C-TIRADS optimisation were male sex, age 40 to 60 years, thyroid nodule size (per 1-mm increase), multiple thyroid nodules, presence of abnormal cervical LNs, and original C-TIRADS 4A category (online supplementary Table 3). A nomogram model was constructed based on these six independent predictors (Fig 1).

Figure 1. Nomogram prediction model to aid radiologists in optimising the Chinese Thyroid Imaging Reporting and Data System classification

Model performance in the derivation cohort

The model demonstrated good discrimination, with an AUC of 0.730 (95% CI=0.697-0.762) in the derivation cohort (online supplementary Fig a). Internal validation using 1000 bootstrap samples yielded a bias-corrected C-statistic of 0.728, indicating stable model performance (online supplementary Table 1). Calibration curves showed good agreement between PPs and observed outcomes (online supplementary Fig b).

Diagnostic thresholds were evaluated to stratify risk. A PP of ≥60% or <30% was considered indicative of a high likelihood of classification change: a PP of ≥60% suggested upgrading, while a PP of <30% suggested downgrading; PPs between 30% and 60% indicated that the classification was likely to remain unchanged. A detailed summary of sensitivity, specificity, and overall accuracy across these thresholds is presented in online supplementary Table 4.

External validation

When applied to the external cohort, the model achieved an AUC of 0.865 (95% CI=0.839-0.891) [online supplementary Fig c], demonstrating excellent generalisability. Calibration plots again confirmed close agreement between predicted and observed probabilities (online supplementary Fig d). At the 60% probability threshold, sensitivity was 85.0%, specificity was 69.0%, and overall accuracy was 79.7% in the external validation cohort. Diagnostic performance metrics across various risk thresholds of the final prediction model were analysed in the external validation population (online supplementary Table 5).

Clinical utility

Decision curve analysis (Fig 2a) demonstrated that the nomogram model provided greater net clinical benefit across a wide range of threshold probabilities compared with treating all or no patients. The clinical impact curve (Fig 2b) showed that the number of true positives closely approximated the predicted number across relevant thresholds. The observed distribution of histopathological outcomes was as follows: in the derivation cohort, 769 nodules (84.6%) were confirmed malignant and 140 (15.4%) were benign; in the validation cohort, 434 nodules (57.9%) were malignant and 316 (42.1%) were benign.

Figure 2. Comparison of the diagnostic efficacy of the Chinese Thyroid Imaging Reporting and Data System (C-TIRADS) and optimised C-TIRADS (*C-TIRADS) in the diagnosis of benign and malignant thyroid nodules. (a) Clinical decision curve of the predictive model for radiologist-optimised *C-TIRADS classification in the derivation cohort. (b) Comparison of the diagnostic efficacy of C-TIRADS and *C-TIRADS for the diagnosis of benign and malignant thyroid nodules in the derivation cohort. (c) Clinical impact curves of the predictive model for radiologist-optimised C-TIRADS classification in the derivation cohort, showing the number of patients classified as high risk (solid curve) and the number of true positives among them (dashed curve) across probability thresholds. (d) Comparison of the diagnostic efficacy of C-TIRADS and *C-TIRADS for the diagnosis of benign and malignant thyroid nodules in the validation cohort

Comparison of diagnostic efficacy between the original C-TIRADS and optimised C-TIRADS classifications demonstrated superior performance of the optimised model in both the derivation and validation cohorts (Fig 2c and d, respectively). The optimised classification achieved higher AUC values for differentiating benign from malignant nodules (AUC=0.97 vs 0.94 in the derivation cohort; AUC=0.97 vs 0.95 in the external validation cohort). The predictive model tended to improve C-TIRADS classification by upgrading category 4A nodules to category 4B or 4C, reflecting enhanced clinical utility (Table 3 and Fig 2).

Table 3. Clinical diagnostic performance of the final predictive model in thyroid nodules (n=1659)

Application example of the nomogram model

A 55-year-old man underwent ultrasound examination, which revealed a solid hypoechoic thyroid nodule in the right lobe measuring approximately 7.1 × 6.4 mm² (Fig 3a). Simultaneously, abnormal LNs were detected on the ipsilateral side of the neck, characterised by indistinct corticomedullary differentiation and suspected microcalcifications (Fig 3b). According to the conventional C-TIRADS system, the nodule was initially classified as category 4B. However, application of the nomogram model yielded a cumulative score of 155 points, corresponding to a malignancy risk of >90%. Based on this result, the TIRADS category was optimised and upgraded to category 5 (Fig 3c). Subsequent histopathological examination confirmed the diagnosis of papillary thyroid microcarcinoma with cervical LN metastasis.

Figure 3. Representative case demonstrating the diagnostic utility of the nomogram-assisted model. (a) A 55-year-old man presenting with a solid hypoechoic nodule in the right lobe of the thyroid gland (arrow). (b) Ultrasound revealing abnormal cervical lymph node architecture, characterised by poorly defined corticomedullary borders and suspected microcalcifications (arrow). (c) Application of the predictive model to the thyroid nodule described above. By summing the scores assigned to six individual indicators, the final total score is approximately 155 points, corresponding to a malignancy risk of >90%. According to the optimised classification system, the lesion should be upgraded from category 4B to category 5

Discussion

This study retrospectively analysed the sonographic characteristics and clinical risk factors of 1659 thyroid nodules from two large tertiary hospitals in western China, with the aim of optimising the C-TIRADS classification. A predictive model integrating clinical parameters and imaging features was developed and externally validated, demonstrating high diagnostic performance (AUC=0.865 in external validation) and clinical benefit, as evidenced by decision curve analysis.

Despite the widespread adoption of various TIRADS frameworks globally,2 4 5 6 7 8 fundamental methodological limitations persist. Current models, such as ACR-TIRADS,6 primarily focus on ultrasound features and rely heavily on consensus-driven rather than statistically validated risk stratification systems.6 15 Although TIRADS demonstrates robust sensitivity in clinical settings, its specificity remains relatively limited.16 Interobserver variability is another key concern—radiologists’ subjective interpretation of ultrasound features can result in inconsistent classification outcomes.17 To address these limitations, various strategies have been proposed, including the integration of artificial intelligence techniques to reduce observer subjectivity.18 19 20 Artificial intelligence has shown promise in matching or even surpassing the specificity achieved by radiologists; however, their clinical implementation remains constrained by challenges in interpretability and low acceptance in routine practice.

Integrating clinical risk factors may enhance risk stratification for thyroid nodules, as suggested by a growing body of evidence.21 In alignment with this, our study incorporated clinical variables including patient age, sex, number of nodules, and cervical LN status into the predictive model, thereby more accurately reflecting routine clinical diagnostic workflows. While previous studies22 23 24 suggested that male patients with thyroid nodules, particularly those with indeterminate fine-needle aspiration cytology undergoing molecular testing, exhibit a higher malignancy risk,25 our study did not identify a significant difference in thyroid cancer incidence between sexes. This discrepancy may be attributable to methodology differences, as molecular testing was not performed in our cohort and all diagnoses were confirmed through postoperative histopathology. The absence of statistical significance for male sex may reflect population-specific characteristics, such as regional variation in risk factor distribution or age composition.26 These methodological and demographic differences may have attenuated the observed sex-related effect. Nonetheless, male patients in our study were assigned higher risk scores, suggesting an association with malignancy risk, despite the lack of statistical significance.

Compared with previous models that primarily focused on intrinsic ultrasound features of thyroid nodules,27 28 29 our nomogram offers a more comprehensive assessment. Although the individual contributions of factors such as sex and age were relatively modest, they reflected subtle clinical patterns often considered by radiologists during decision making. The C-TIRADS optimisation approach demonstrated clear advantages, particularly in reducing unnecessary invasive procedures without compromising diagnostic accuracy, achieving an AUC of 0.972. Furthermore, the new model indicated that a risk threshold of ≥60% favoured the recommendation for C-TIRADS optimisation, whereas a threshold of <30% favoured exclusion. The integration of complex imaging data with clinical information represents a core competency for radiologists.30 With appropriate standardised training and communication frameworks in place, radiologists are well positioned to leverage quantitative metrics generated by the new model into routine diagnostic workflows. This advancement holds promise for improving diagnostic consistency and accuracy in clinical practice.

Limitations

This study has several limitations that should be acknowledged. First, the optimisation of the TIRADS classification was influenced by radiologists’ subjective judgement, which may have contributed to interobserver variability. Second, although data collection was conducted by trained junior radiologists, observer variation and the subjective nature of ultrasound interpretation may have affected the model’s performance.31 Third, internal validation using bootstrap resampling may have overestimated model performance due to potential overfitting; therefore, external validation was essential to confirm generalisability. Fourth, owing to the retrospective design, only a limited set of clinical parameters (eg, sex, age, and cervical LN status) was included. Other relevant factors such as body mass index, environmental exposures, nodule location, family history of thyroid cancer, and radiation exposure history,32 33 were not assessed. Finally, the study cohort exclusively comprised cases confirmed by surgical pathology, resulting in a relatively low proportion of benign lesions, which may have introduced selection bias. The exclusion of patients diagnosed solely by fine-needle aspiration was intentional but may have affected the generalisability of the findings.

Future directions

To address the limitations of the present study, future research should aim to standardise the application of TIRADS by adopting unified classification frameworks and implementing regular training programmes to enhance interobserver consistency. Prospective multicentre studies involving broader and more diverse populations are warranted, incorporating a wider range of clinical risk factors to improve predictive accuracy. In particular, data regarding family history, radiation exposure, and other relevant variables across centres would support more comprehensive risk assessment and enhance the generalisability of prediction models. In addition, including patients with fine-needle aspiration–confirmed benign nodules may help achieve a more balanced representation of benign and malignant cases. The development and application of nomogram-based structured training programmes for radiologists could also be explored to further improve diagnostic consistency and clinical utility. While the widespread adoption of a revised classification system will require time, we hope that the findings of this study may contribute to that transition.

Conclusion

We developed and externally validated a nomogram-based predictive model that integrates imaging features and clinical risk factors to optimise C-TIRADS classification for thyroid nodules. The model demonstrated good discrimination and calibration across internal and external cohorts, offering a practical tool to assist radiologists in refining diagnostic assessments and improving clinical decision making. Future research incorporating additional clinical variables and prospective validation is warranted to further strengthen the model’s applicability across diverse clinical settings.

Author contributions

Concept or design: Y Liang, Y Zou, P He, Q Chen.
Acquisition of data: Y Liang, Y Zou, Z Zou, B Ren.
Analysis or interpretation of data: Y Liang, S Peng, Y Zou.
Drafting of the manuscript: Y Liang, Y Zou, HM Yuan, Z Zou.
Critical revision of the manuscript for important intellectual content: P He, Y Zou.

All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.

Conflicts of interest

The authors have disclosed no conflicts of interest.

Declaration

This manuscript was initially posted as a preprint entitled ‘Development and validation of a clinical prediction model to aid radiologists optimize thyroid C-TIRADS classification’ on Research Square (DOI: 10.21203/rs.3.rs-3831900/v1). After peer feedback and extensive revisions undertaken collaboratively by the author team, the current version has substantially evolved and markedly differs from the preprint version.

Funding/support

This research was supported by Sichuan Science and Technology Program (Ref Nos.:2025ZNSFSC1751, 2026YFHZ0039), the University-Industry Collaborative Education Program (Ref No.: 250505236300920), the University-level Project of North Sichuan Medical College (Ref Nos.: CXSY24-06, CBY22-QNA48), and the Hospital-level Projects of the Affiliated Hospital of North Sichuan Medical College, China (Ref Nos.: 210930, 2023-2GC013, 2025LC010). The funders had no role in the study design, data collection/analysis/interpretation, or manuscript preparation.

Ethics approval

This research was approved by the Ethics Committee of Sichuan Provincial People’s Hospital (Ref No.: ER20210347) and the Ethics Committee of Affiliated Hospital of North Sichuan Medical College, China (Ref No.: 2021ER436-1). The requirement for informed patient consent was waived by both Committees due to the retrospective nature of the research.

Supplementary material

The supplementary material was provided by the authors, and some information may not have been peer reviewed. Accepted supplementary material will be published as submitted by the authors, without any editing or formatting. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by the Hong Kong Academy of Medicine and the Hong Kong Medical Association. The Hong Kong Academy of Medicine and the Hong Kong Medical Association disclaim all liability and responsibility arising from any reliance placed on the content.

References

1. Haugen BR, Alexander EK, Bible KC, et al. 2015 American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: the American Thyroid Association guidelines task force on thyroid nodules and differentiated thyroid cancer. Thyroid 2016;26:1-133. Crossref

2. Zhou J, Song Y, Zhan W, et al. Thyroid imaging reporting and data system (TIRADS) for ultrasound features of nodules: multicentric retrospective study in China. Endocrine 2021;72:157-70. Crossref

3. Trimboli P. Complexity in the interpretation and application of multiple guidelines for thyroid nodules: the need for coordinated recommendations for “small” lesions. Rev Endocr Metab Disord 2025;26:223-7. Crossref

4. Park JY, Lee HJ, Jang HW, et al. A proposal for a thyroid imaging reporting and data system for ultrasound features of thyroid carcinoma. Thyroid 2009;19:1257-64. Crossref

5. Horvath E, Majlis S, Rossi R, et al. An ultrasonogram reporting system for thyroid nodules stratifying cancer risk for clinical management. J Clin Endocrinol Metab 2009;94:1748-51. Crossref

6. Tessler FN, Middleton WD, Grant EG, et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): white paper of the ACR TI-RADS Committee. J Am Coll Radiol 2017;14:587-95. Crossref

7. Shin JH, Baek JH, Chung J, et al. Ultrasonography diagnosis and imaging-based management of thyroid nodules: revised Korean Society of Thyroid Radiology consensus statement and recommendations. Korean J Radiol 2016;17:370-95. Crossref

8. Russ G, Bonnema SJ, Erdogan MF, Durante C, Ngu R, Leenhardt L. European Thyroid Association guidelines for ultrasound malignancy risk stratification of thyroid nodules in adults: the EU-TIRADS. Eur Thyroid J 2017;6:225-37. Crossref

9. Chen Z, Wang JJ, Du JB, et al. Development and validation of a dynamic nomogram for predicting central lymph node metastasis in papillary thyroid carcinoma patients based on clinical and ultrasound features. Quant Imaging Med Surg 2025;15:1555-70. Crossref

10. Boucai L, Zafereo M, Cabanillas ME. Thyroid cancer: a review. JAMA 2024;331:425-35. Crossref

11. Zhang J, Xu S. High aggressiveness of papillary thyroid cancer: from clinical evidence to regulatory cellular networks. Cell Death Discov 2024;10:378. Crossref

12. Ma T, Semsarian CR, Barratt A, et al. Rethinking low-risk papillary thyroid cancers <1 cm (papillary microcarcinomas): an evidence review for recalibrating diagnostic thresholds and/or alternative labels. Thyroid 2021;31:1626-38. Crossref

13. Kwong N, Medici M, Angell TE, et al. The influence of patient age on thyroid nodule formation, multinodularity, and thyroid cancer risk. J Clin Endocrinol Metab 2015;100:4434-40. Crossref

14. Pizzato M, Li M, Vignat J, et al. The epidemiological landscape of thyroid cancer worldwide: GLOBOCAN estimates for incidence and mortality rates in 2020. Lancet Diabetes Endocrinol 2022;10:264-72. Crossref

15. Tessler FN, Middleton WD, Grant EG, Hoang JK. Re: ACR Thyroid Imaging, Reporting and Data System (TI-RADS): white paper of the ACR TI-RADS Committee. J Am Coll Radiol 2018;15(3 Pt A):381-2. Crossref

16. Angelopoulos N, Goulis DG, Chrisogonidis I, et al. Diagnostic performance of European and American College of Radiology Thyroid Imaging Reporting and Data System classification systems in thyroid nodules over 20 mm in diameter. Endocr Pract 2025;31:72-9. Crossref

17. Jin Z, Pei S, Shen H, et al. Comparative study of C-TIRADS, ACR-TIRADS, and EU-TIRADS for diagnosis and management of thyroid nodules. Acad Radiol 2023;30:2181-91. Crossref

18. Wildman-Tobriner B, Buda M, Hoang JK, et al. Using artificial intelligence to revise ACR TI-RADS risk stratification of thyroid nodules: diagnostic accuracy and utility. Radiology 2019;292:112-9. Crossref

19. Wu SH, Li MD, Tong WJ, et al. Adaptive dual-task deep learning for automated thyroid cancer triaging at screening US. Radiol Artif Intell 2025;7:e240271. Crossref

20. Trimboli P, Colombo A, Gamarra E, Ruinelli L, Leoncini A. Performance of computer scientists in the assessment of thyroid nodules using TIRADS lexicons. J Endocrinol Invest 2025;48:877-83. Crossref

21. Kobaly K, Kim CS, Mandel SJ. Contemporary management of thyroid nodules. Annu Rev Med 2022;73:517-28. Crossref

22. Xu L, Li G, Wei Q, El-Naggar AK, Sturgis EM. Family history of cancer and risk of sporadic differentiated thyroid carcinoma. Cancer 2012;118:1228-35. Crossref

23. Iglesias ML, Schmidt A, Ghuzlan AA, et al. Radiation exposure and thyroid cancer: a review. Arch Endocrinol Metab 2017;61:180-7. Crossref

24. Saenko V, Mitsutake N. Radiation-related thyroid cancer. Endocr Rev 2024;45:1-29. Crossref

25. Figge JJ, Gooding WE, Steward DL, et al. Do ultrasound patterns and clinical parameters inform the probability of thyroid cancer predicted by molecular testing in nodules with indeterminate cytology? Thyroid 2021;31:1673-82. Crossref

26. Li X, Xing M, Tu P, et al. Urinary iodine levels and thyroid disorder prevalence in the adult population of China: a large-scale population-based cross-sectional study. Sci Rep 2025;15:14273. Crossref

27. Xiao J, Xiao Q, Cong W, et al. Discriminating malignancy in thyroid nodules: the nomogram versus the Kwak and ACR TI-RADS. Otolaryngol Head Neck Surg 2020;163:1156-65. Crossref

28. Xin Y, Liu F, Shi Y, Yan X, Liu L, Zhu J. A scoring system for assessing the risk of malignant partially cystic thyroid nodules based on ultrasound features. Front Oncol 2021;11:731779. Crossref

29. Zhou T, Hu T, Ni Z, et al. Comparative analysis of machine learning-based ultrasound radiomics in predicting malignancy of partially cystic thyroid nodules. Endocrine 2024;83:118-26. Crossref

30. Bluethgen C, Van Veen D, Zakka C, et al. Best practices for large language models in radiology. Radiology 2025;315:e240528. Crossref

31. He Z, Li Y, Zeng W, et al. Can a computer-aided mass diagnosis model based on perceptive features learned from quantitative mammography radiology reports improve junior radiologists’ diagnosis performance? An observer study. Front Oncol 2021;11:773389. Crossref

32. Kim Y, Roh J, Song DE, et al. Risk factors for posttreatment recurrence in patients with intermediate-risk papillary thyroid carcinoma. Am J Surg 2020;220:642-7. Crossref

33. Zhao J, Wen J, Wang S, Yao J, Liao L, Dong J. Association between adipokines and thyroid carcinoma: a meta-analysis of case-control studies. BMC Cancer 2020;20:788. Crossref

Utilisation trends and early outcomes of robotic arm–assisted total hip arthroplasty in a tertiary joint replacement centre in Hong Kong

Hong Kong Med J 2026 Feb;32(1):23–9 | Epub 2 Feb 2026

https://doi.org/10.12809/hkmj2513314

ORIGINAL ARTICLE

Utilisation trends and early outcomes of robotic arm–assisted total hip arthroplasty in a tertiary joint replacement centre in Hong Kong

KL Fong¹; Amy Cheung, FHKAM (Orthopaedic Surgery), FHKCOS²; Michelle Hilda Luk, FHKAM (Orthopaedic Surgery), FHKCOS²; Thomas KC Leung, FHKAM (Orthopaedic Surgery), FHKCOS²; Lawrence CM Lau, FHKAM (Orthopaedic Surgery), FHKCOS²; PK Chan, FHKAM (Orthopaedic Surgery), FHKCOS¹; KY Chiu, FHKAM (Orthopaedic Surgery), FHKCOS¹; Henry Fu, FHKAM (Orthopaedic Surgery), FHKCOS¹

¹ Department of Orthopaedics and Traumatology, The University of Hong Kong, Hong Kong SAR, China

² Department of Orthopaedics and Traumatology, Queen Mary Hospital, Hong Kong SAR, China

Corresponding author: Prof Henry Fu (drhfu@ortho.hku.hk)

Full paper in PDF

Abstract

Introduction: This study evaluated utilisation trends and early outcomes of robotic arm–assisted primary total hip arthroplasty (rTHA) compared with conventional THA (cTHA) in Hong Kong.

Methods: This retrospective cohort study included all patients who underwent primary THA in public hospitals under the Hong Kong West Cluster (HKWC) from 2019 to 2024. Data were retrieved from the Hospital Authority’s electronic databases. The primary outcome was the percentage utilisation of rTHA relative to cTHA. Secondary outcomes included operating time (skin-to-skin), length of stay (LOS), 30- and 90-day reoperation rates, and 30- and 90-day emergency department attendance. Differences in these outcomes between rTHA and cTHA were examined.

Results: In total, there were 311 and 242 cases of rTHA and cTHA, respectively. Robotic utilisation increased from 32.0% in 2019 to 62.2% in 2024. Regarding patient outcomes, rTHA increased operating time by 14.59 minutes (142.02 ± 53.88 vs 127.43 ± 53.34; P=0.002). There was no significant difference in median LOS between the two groups. Robotic surgery was also associated with a lower 30-day reoperation rate (0.32% vs 2.07%; P=0.049). One reoperation due to dislocation was performed in the rTHA group. In the cTHA group, one dislocation, two periprosthetic fractures, and two infections required revision surgery.

Conclusion: Given the increasing use of rTHA in the HKWC, the present findings suggest that rTHA is associated with a lower 30-day reoperation rate. As the first local study on early outcomes of rTHA, these results may serve as reference data for other centres.

New knowledge added by this study

Utilisation of robotic arm–assisted primary total hip arthroplasty (rTHA) nearly doubled between 2019 and 2024.
Robotic arm–assisted primary total hip arthroplasty was associated with a lower 30-day reoperation rate.

Implications for clinical practice or policy

Early results suggested that rTHA was associated with fewer postoperative complications requiring reoperation.
Long-term data are needed to further evaluate trends in operating time and length of stay, and to determine how these outcomes translate into improved functional outcomes.

[Abstract in Chinese]

Introduction

In Hong Kong, robotic surgery has gained popularity across various specialties, with the Da Vinci robot becoming the standard of care in urology and seeing widespread use in general surgery.1 Orthopaedic robotic systems are often semi-active and partially controlled by the surgeon.2 In total hip replacement, an image-based, semi-active, haptic-constrained robotic arm system is commonly used. The Mako Robotic Arm Assisted Surgical System (Stryker Corp, Fort Lauderdale [FL], US) is a surgical system for total hip replacement approved by the US Food and Drug Administration.3 Surgical planning is performed using three-dimensional computed tomography scans, enabling accurate, patient-specific planning. Bone removal is performed under haptic control by the robotic arm, with component implantation angles also guided by the robot, enhancing precision and accuracy.4 5 Western literature has shown that robotic arm–assisted primary total hip arthroplasty (rTHA) yields better radiological and clinical outcomes.6 7 8 However, local data on the early clinical outcomes of robotic total hip replacement remain limited. Robotics was first introduced locally by the Hong Kong West Cluster (HKWC) in 2019, and its use has been increasing. Our cluster has since accumulated substantial experience and moved beyond the learning curve. This study aimed to evaluate utilisation trends and patient outcomes of rTHA compared with conventional THA (cTHA).

Methods

Objective

The primary outcome was the percentage utilisation of rTHA relative to cTHA in the HKWC from 2019 to 2024. Secondary outcomes included operating time (skin-to-skin), length of stay (LOS), 30-day and 90-day reoperation, and 30-day and 90-day emergency department attendance. Length of stay was defined as the duration of inpatient admission following THA. Discharge criteria included the ability to ambulate with a walking aid and the absence of impending medical conditions. Reoperation was defined as undergoing another hip procedure, such as revision or implant removal, within 30 or 90 days of surgery. Emergency department attendance was defined as presentation to the accident and emergency department within 30 or 90 days following discharge.

Additionally, postoperative complication rates were examined in terms of reoperation, emergency department attendance, and the corresponding diagnoses. Complications of interest included dislocation, periprosthetic fracture, and periprosthetic joint infection. The study adhered to the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guideline.

Surgical technique

Total hip arthroplasty in both groups was performed via a posterior approach with the patient in the left lateral decubitus position. All patients received a cementless, proximally coated femoral stem (Accolade II; Stryker Corp, Mahwah [NJ], US) and a porous acetabular shell (Trident Acetabular System; Stryker Corp, Mahwah [NJ], US).3

In the cTHA group, the femoral osteotomy site was marked based on a predetermined distance from the lesser and greater trochanters. The acetabulum was reamed freehand, down to the true floor and healthy bleeding bone. Cup impaction was guided by an alignment guide and intraoperative landmarks, including the transverse acetabular ligament and the anterior and posterior acetabular walls, to determine the orientation of the acetabular component.9 10

All rTHAs were performed using the Mako Robotic Arm Assisted Surgical System, which guided acetabular reaming and component placement within haptically confined boundaries. A trial cup was inserted at the appropriate abduction angle, with anteversion guided by the robotic arm.10

Study design and patient selection

This was a retrospective cohort study. Data were retrieved from the Clinical Data Analysis and Reporting System (CDARS) and the Clinical Management System (CMS). The CDARS is a database containing medical information for research purposes, whereas the CMS is primarily used for day-to-day clinical management. The function to distinguish between rTHA and cTHA was introduced in CDARS in 2021. Therefore, data from 1 January 2021 to 31 December 2024 were collected via CDARS, while data from 2019 to 2020 were obtained through CMS chart review. Both systems follow standardised data protocols and can be used concurrently.

All patients who underwent primary unilateral rTHA or cTHA in the HKWC were included. Diagnoses included osteoarthritis, avascular necrosis, aseptic necrosis, developmental dysplasia of the hip, dislocation, and fractures. Patients with diagnoses of bone malignancy, chronic osteomyelitis, or complex primary THA—such as Crowe type III/IV hip dysplasia or post-traumatic osteoarthritis with retained hardware—were excluded. Patients who had staged bilateral procedures were included as separate cases. During the initial learning phase in 2019, all surgeries were performed by a single surgeon (corresponding author). From 2020 onwards, other surgeons within the division began performing rTHA.

Statistical analysis

All analyses were conducted using SPSS (Windows version 29.0; IBM Corp, Armonk [NY], US). A two-tailed significance threshold was set at P<0.05. The normality of continuous variables was assessed using skewness and kurtosis, as well as the Shapiro–Wilk and Kolmogorov–Smirnov tests. Normally distributed continuous variables, such as operating time, were compared using independent samples t tests. The non–parametric continuous variable, LOS, was analysed using the Mann-Whitney U test. Categorical data were compared via the Chi squared test.

Results

From 2019 to 2024, a total of 311 and 242 THAs were performed in the rTHA and cTHA groups, respectively. Patient demographics are summarised in Table 1. In terms of sex distribution, 61.7% of patients in the rTHA group and 63.6% of those in the cTHA group were women. Patients undergoing rTHA had a lower mean age at the time of surgery compared with those receiving cTHA (62.48 ± 12.88 vs 66.10 ± 10.52 years; P=0.002). There was a tendency for rTHA to be performed in younger patients, although the distribution of diagnostic categories was similar between groups.

Table 1. Baseline characteristics

Osteoarthritis was the most common diagnosis in both groups, accounting for 58.5% of rTHA cases and 51.2% of cTHA cases. The second most common diagnosis was avascular necrosis, representing 15.1% of rTHA cases and 21.1% of cTHA cases (Table 1).

Utilisation trends

The primary outcome was the utilisation rate of rTHA in the HKWC. As shown in Table 2, a steady increase in robotic cases was observed, from 32.0% in 2019 to 62.2% in 2024. Notably, the highest proportion was recorded in 2023, at 75.2%. In contrast, the proportion of conventional cases steadily declined, almost halving from 68.0% in 2019 to 37.8% in 2024. The substantial increase in rTHA proportion illustrates a clear shift from cTHA to rTHA as the predominant surgical approach over the study period.

Table 2. Utilisation trends of robotic arm–assisted primary total hip arthroplasty and conventional total hip arthroplasty from 2019 to 2024

Operating time (skin-to-skin)

The secondary outcomes are presented in Table 3. Robotic arm–assisted primary total hip arthroplasty had a mean operating time of 142.02 minutes, which was 14.59 minutes longer than that of cTHA (127.43 minutes). For rTHA, the mean operating time was 131.53 minutes in 2019, increased to 139.58 minutes in 2020 with more surgeons beginning their learning curve, and then reached a plateau over the next 2 years (2021: 146.99 minutes; 2022: 152.79 minutes). In the final 2 years of the study, operating time decreased to 142.00 minutes in 2023 and 133.83 minutes in 2024, reflecting passing of learning curve by the whole surgical team. In contrast, cTHA operating times ranged from 111 to 139 minutes, without a clear trend. In the first 2 years, operating times were similar (2019: 131.04 minutes; 2020: 131.75 minutes), followed by a slight increase to 139.38 minutes in 2022, then dropped to 111.16 minutes in 2023, with a moderate increase to 120.04 minutes in 2024.

Table 3. Secondary outcomes (n=553)

Length of stay

Discharge criteria remained consistent throughout the study period and included the ability to ambulate independently with a walking aid, effective pain control, absence of immediate wound complications, and no major medical issues. Most patients were discharged directly under the enhanced recovery after surgery protocol; only those undergoing complex primary THA (<10% of the cohort) were transferred to rehabilitation hospitals. The median LOS was the same in both groups (6.00 vs 6.00 days; P=0.260) [Table 3]. When rTHA was first introduced in 2019, all procedures were performed by a single surgeon, which may have influenced early outcomes. In 2020 and 2021, more surgeons began performing rTHA, which may partly explain the longer LOS observed during this learning-curve period.

Reoperation and emergency department attendance

Robotic arm–assisted primary total hip arthroplasty was associated with a lower 30-day reoperation rate compared with cTHA (0.32% vs 2.07%; P=0.049). Similarly, a trend towards a lower 90-day reoperation rate was observed for rTHA (0.64% vs 2.48%; P=0.072) [Table 3].

All 30-day reoperations were hip-related. As shown in Table 4, one reoperation was performed in the rTHA group and five in the cTHA group. In the rTHA group, reoperation was required for a hip dislocation, which was managed by closed reduction. In the cTHA group, two periprosthetic fractures of the proximal femur were treated with open reduction and internal fixation. Two additional reoperations were performed for wound infections, and one hip dislocation was managed by closed reduction.

Table 4. Reoperation and emergency department attendance causes (n=553)

All 90-day reoperations were also hip-related. In the rTHA group, one additional case of dislocation was noted. In the cTHA group, one new case of periprosthetic fracture was identified (Table 4).

Discussion

The number of THAs utilising robotic assistance increased over the study period. The proportion of robotic cases relative to cTHA also rose, with rTHA accounting for 56.2% of all THAs when all years were combined. These findings indicate a shift in the primary surgical approach within the HKWC from conventional to robotic techniques. At present, four public hospitals in Hong Kong have acquired robotic systems, with several additional systems available on loan. Brinkman et al11 reported that public interest in rTHA substantially increased between 2011 and 2020. Compared with online search volumes for conventional arthroplasty, this growth was statistically significant.

Clement et al12 reported that, despite the higher costs associated with robotics, rTHA was a cost-effective intervention compared with cTHA owing to greater gains in health-related quality of life, as measured by the EuroQol 5-Dimension. In addition, the rising popularity of rTHA may be attributed to its favourable clinical, functional, and radiological outcomes, which are discussed further below.

Robotic THA was associated with an increase in operating time of approximately 15 minutes, which is slightly less than the 20-minute increase reported by Han et al (20.72 minutes; P=0.002).13 This difference may be attributable to the need for system registration or placement of positioning pins, as well as the effects of the learning curve. When rTHA was first introduced in Hong Kong in 2019, only one experienced surgeon was using the procedure, with an average operating time of 131 minutes. As more surgeons began using the robotic system, a learning-curve effect was suggested by an increase in operating time over the next 3 years (139.6, 147.0, and 152.8 minutes, respectively). Notably, robotic operating time then decreased by 11 minutes from 2022 to 2023, and by a further 8 minutes to 133.83 minutes, suggesting increased familiarity with the system and the possible completion of the learning curve. Kayani et al14 similarly reported that robot-assisted acetabular cup positioning during THA was associated with a learning curve of 12 cases.

There were no statistically significant differences in LOS between the rTHA and cTHA groups; both had a median LOS of 6.00 days. In a retrospective study, Remily et al15 matched patients in a 1:1 ratio between robotic and conventional groups (4630 patients per group) and reported a significantly shorter mean LOS in the rTHA group (3.4 vs 3.7 days; P=0.001). These findings may reflect the ability of robotic technology to execute preoperative plans tailored to each patient’s unique anatomy. The results may also be related to reduced iatrogenic trauma and faster postoperative rehabilitation. Similarly, Heng et al16 found that the mean LOS in the robotic group was approximately 1 day shorter. Nevertheless, differences in data distribution and reporting methods should be noted. While previous authors reported mean LOS, we reported the median LOS due to the non-parametric distribution of our data.

Social and cultural factors may also influence LOS. Western patients often have access to more spacious home environments, whereas patients in Hong Kong may reside in more confined living spaces, potentially reducing their willingness or readiness for early discharge. Furthermore, patients and their families in Hong Kong often adopt a more conservative approach to discharge, preferring extended care under medical supervision and a self-perceived burden to their family members if they return home early.17 These factors may contribute to a prolonged LOS.

It was evident that rTHA was associated with a lower 30-day reoperation rate, with a trend towards a lower 90-day reoperation rate. Our findings are consistent with those of Shaw et al18 who reported significantly lower dislocation rates with rTHA compared with cTHA (0.6% vs 2.5%; P<0.046). Notably, all cases of unstable rTHA were successfully managed conservatively in the absence of component malposition, whereas 46% of unstable cTHA cases required revision surgery for recurrent instability due to malalignment.18 A previous postoperative analysis in Hong Kong19 showed that 96% of robotically positioned acetabular cups fell within the Lewinnek safe zone (inclination 30°-50°, anteversion 5°-25°).

Although rTHA improves the accuracy of implant positioning and reduces outliers in acetabular cup placement,20 21 there remains a lack of data concerning how these improved radiological outcomes translate into differences in long-term clinical recovery, functional outcomes, implant survivorship, and complication rates when compared with cTHA.22

Limitations

To our knowledge, this is the first territory-wide study in Asia comparing cTHA and rTHA. However, several limitations should be acknowledged. First, the use of big data analysis through the CDARS precluded adjustment for certain confounding factors, such as surgeon- and hospital-related variables. Second, the dataset was confined to the HKWC as ethics approval could not be obtained for multi-cluster or private hospital data. Although other public-sector clusters are also managed by the Hospital Authority, caution should be exercised when comparing our findings to other settings. Nevertheless, the inclusion of multiple surgeons reflects real-world clinical practice. Finally, functional outcomes and patient-reported outcome measures were not assessed; as such, the impact of rTHA from the patient’s perspective could not be evaluated.

Evaluation of longer-term outcomes and registry data from additional clusters will be essential to develop optimal THA strategies, those that achieve key technical objectives, enhance patient outcomes, and reduce complications.

Conclusion

The use of rTHA nearly doubled between 2019 and 2024 and was associated with a lower 30-day reoperation rate compared with cTHA. However, as this study focused solely on early patient outcomes, further research is warranted to determine whether these findings translate into improved long-term functional outcomes.

Author contributions

Concept or design: KL Fong, H Fu.
Acquisition of data: KL Fong, H Fu.
Analysis or interpretation of data: KL Fong, H Fu.
Drafting of the manuscript: All authors.
Critical revision of the manuscript for important intellectual content: All authors.

All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.

Conflicts of interest

All authors have disclosed no conflicts of interest.

Declaration

The results of this study were presented as an oral presentation at the 44th Annual Congress of Hong Kong Orthopaedic Association, Hong Kong, 2-3 November 2024.

Funding/support

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Ethics approval

This research was approved by the Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster, Hong Kong (Ref No.: UW 24-128). The requirement for informed patient consent was waived by the Board due to the retrospective nature of the study.

References

1. Ng AT, Tam PC. Current status of robot-assisted surgery. Hong Kong Med J 2014;20:241-50. Crossref

2. Smith A, Picheca L, Mahood Q. Robotic Surgical Systems for Orthopedics. Ottawa: Canadian Agency for Drugs and Technologies in Health; 2022. Available from: https://www.ncbi.nlm.nih.gov/books/NBK602663/. Accessed 12 Mar 2025.

3. Stryker. Available from: https://www.stryker.com. Accessed 12 Mar 2025.

4. Inabathula A, Semerdzhiev DI, Srinivasan A, Amirouche F, Puri L, Piponov H. Robots on the stage: a snapshot of the American robotic total knee arthroplasty market. JB JS Open Access 2024;9:e24.00063. Crossref

5. Jahng KH, Kamara E, Hepinstall MS. Haptic robotics in total hip arthroplasty. In: Minim Invasive Surg Orthopaedics. New York: Springer; 2015: 1-15. Crossref

6. Salášek M, Pavelka T, Rezek J, et al. Mid-term functional and radiological outcomes after total hip replacement performed for complications of acetabular fractures. Injury 2023;54:110916. Crossref

7. De Santis V, Bonfiglio N, Basilico M, et al. Clinical and radiographic outcomes after total hip arthroplasty with the NANOS neck preserving hip stem: a 10 to 16-year followup study. BMC Musculoskelet Disord 2022;22(Suppl 2):1061. Crossref

8. Perets I, Walsh JP, Close MR, Mu BH, Yuen LC, Domb BG. Robot-assisted total hip arthroplasty: clinical outcomes and complication rate. Int J Med Robot 2018;14:e1912. Crossref

9. Fontalis A, Kayani B, Plastow R, et al. A prospective randomized controlled trial comparing CT-based planning with conventional total hip arthroplasty versus robotic arm-assisted total hip arthroplasty. Bone Joint J 2024;106-B:324-35. Crossref

10. Domb BG, El Bitar YF, Sadik AY, Stake CE, Botser IB. Comparison of robotic-assisted and conventional acetabular cup placement in THA: a matched-pair controlled study. Clin Orthop Relat Res 2014;472:329-36. Crossref

11. Brinkman JC, Christopher ZK, Moore ML, Pollock JR, Haglin JM, Bingham JS. Patient interest in robotic total joint arthroplasty is exponential: a 10-year Google trends analysis. Arthroplast Today 2022;15:13-8. Crossref

12. Clement ND, Gaston P, Hamilton DF, et al. A cost-utility analysis of robotic arm-assisted total hip arthroplasty: using robotic data from the private sector and manual data from the National Health Service. Adv Orthop 2022:2022:5962260. Crossref

13. Han PF, Chen CL, Zhang ZL, et al. Robotics-assisted versus conventional manual approaches for total hip arthroplasty: a systematic review and meta-analysis of comparative studies. Int J Med Robot 2019;15:e1990. Crossref

14. Kayani B, Konan S, Huq SS, Ibrahim MS, Ayuob A, Haddad FS. The learning curve of robotic-arm assisted acetabular cup positioning during total hip arthroplasty. Hip Int 2021;31:311-9. Crossref

15. Remily EA, Nabet A, Sax OC, Douglas SJ, Pervaiz SS, Delanois RE. Impact of robotic assisted surgery on outcomes in total hip arthroplasty. Arthroplast Today 2021;9:46-9. Crossref

16. Heng YY, Gunaratne R, Ironside C, Taheri A. Conventional vs robotic arm assisted total hip arthroplasty (THA) surgical time, transfusion rates, length of stay, complications and learning curve. J Arthritis 2018;7:1000272. Crossref

17. Bayer-Oglesby L, Zumbrunn A, Bachmann N; SIHOS Team. Social inequalities, length of hospital stay for chronic conditions and the mediating role of comorbidity and discharge destination: a multilevel analysis of hospital administrative data linked to the population census in Switzerland. PLoS One 2022;17:e0272265. Crossref

18. Shaw JH, Rahman TM, Wesemann LD, Jiang CZ, G Lindsay-Rivera K, Davis JJ. Comparison of postoperative instability and acetabular cup positioning in robotic-assisted versus traditional total hip arthroplasty. J Arthroplasty 2022;37(8S):S881-9. Crossref

19. Fu CH, Cheung YL, Cheung MH, et al. Robotic arm-assisted total hip replacement: early experience in Hong Kong. In: Proceedings of the 40th Annual Congress of the Hong Kong Orthopaedic Association; 2020 Oct 31-Nov 1; Hong Kong. Hong Kong: Hong Kong Academy of Medicine Press; 2020: 71. Available from: https://hub.hku.hk/handle/10722/305989. Accessed 12 Mar 2025.

20. Beverland DE, O’Neill CK, Rutherford M, Molloy D, Hill JC. Placement of the acetabular component. Bone Joint J 2016;98-B(1 Suppl A):37-43. Crossref

21. Kayani B, Konan S, Thakrar RR, Huq SS, Haddad FS. Assuring the long-term total joint arthroplasty: a triad of variables. Bone Joint J 2019;101-B(1_Supple_A):11-8. Crossref

22. Kayani B, Konan S, Ayuob A, Ayyad S, Haddad FS. The current role of robotics in total hip arthroplasty. EFORT Open Rev 2019;4:618-25. Crossref

Validation of diagnosis codes for pleural diseases and procedure codes for relevant respiratory procedures in a healthcare database in Hong Kong: a single tertiary centre study

Hong Kong Med J 2026 Feb;32(1):13–22 | Epub 30 Jan 2026

https://doi.org/10.12809/hkmj2412275

ORIGINAL ARTICLE

Validation of diagnosis codes for pleural diseases and procedure codes for relevant respiratory procedures in a healthcare database in Hong Kong: a single tertiary centre study

Ken KP Chan, MB, ChB, FRCP^1,2; Timothy CC Ng, BSc¹; CY Sze, BSc¹; KC Ling, MPH¹; Christopher Chan, MB, ChB, MRCP¹; Charlotte HY Lau, MB, ChB, MRCP¹; Stephanie WT Ho, MB, ChB, MRCP¹; Joyce KC Ng, MB, ChB, FHKCP¹; Rachel LP Lo, MB, ChB, FHKCP¹; WH Yip, MB, ChB, FHKCP¹; Jenny CL Ngai, MB, ChB, FRCP¹; KW To, MB, ChB, FRCP¹; Fanny WS Ko, MD, FRCP¹; David SC Hui, MD, FRCP¹

¹ Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China

² Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China

Corresponding author: Prof David SC Hui (dschui@cuhk.edu.hk)

Full paper in PDF

Abstract

Introduction: There are insufficient population-based epidemiological data on various pleural diseases in Hong Kong. We aimed to validate ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification) codes for pleural diseases and relevant procedures prior to conducting epidemiological analyses using local electronic health records.

Methods: Hospitalisation episodes coded as ‘pneumothorax’, ‘pleural effusion’, and trauma-related pleural events, as well as procedures beginning with ICD-9-CM codes 33 and 34 between 2013 and 2022, were retrieved from the Hospital Authority. Paediatric patients and uninterrupted hospitalisation episodes were excluded. The cohort was filtered to include those hospitalised at Prince of Wales Hospital (PWH). Up to 50 hospitalisation episodes were randomly selected for manual validation. Positive predictive values (PPVs) with 95% confidence intervals of individual codes were calculated; successful validation was defined as a PPV ≥0.700. The primary endpoint was the PPV of individual diagnosis and procedure codes.

Results: A total of 26 757, 218 018, 1269, 185 154, and 106 450 hospitalisation episodes with non-traumatic pneumothorax, non-traumatic pleural effusion, trauma-related pleural events, procedures with code 33, and procedures with code 34, respectively, were retrieved. Within the PWH cohort, PPVs for these diagnosis and procedure codes were 0.853 (0.787-0.904), 0.928 (0.903-0.948), 0.957 (0.907-0.981), 0.932 (0.913-0.948), and 0.933 (0.916-0.948), respectively. Procedures involving indwelling pleural catheterisation and open drainage of the pleural cavity failed validation due to frequent miscoding.

Conclusion: This is the first validation study of clinical codes for pleural diseases and related procedures in Hong Kong. All diagnosis codes and most procedure codes were successfully validated.

New knowledge added by this study

This is the first validation study of clinical codes (International Classification of Diseases, Ninth Revision, Clinical Modification) for pleural diseases and relevant procedures in Hong Kong.
All diagnosis codes and most procedure codes were successfully validated.
Duplication of codes for similar diagnoses or procedures was identified.

Implications for clinical practice or policy

With the emergence of new respiratory procedures, diagnosis and procedure codes should be updated regularly.
Removal or consolidation of duplicated subcodes in the Hospital Authority system is necessary to facilitate accurate future research and analysis using clinical codes.
Researchers should be reminded to search all relevant diagnosis and procedure codes to minimise missing data when identifying specific diseases or procedures.

[Abstract in Chinese]

Introduction

Pleural diseases are common respiratory conditions that often require hospital admission and have shown an increasing incidence.1 2 In the United States, approximately 1.5 million patients experience pleural effusion annually, with most cases attributed to congestive heart failure, pneumonia, and cancer.3 4 A recent multicentre, cross-sectional study in China estimated the prevalence of pleural effusion at 4684 per 1 million Chinese adults.5 In that study, the most common causes were parapneumonic effusion and empyema (25.1%), malignant neoplasms (23.7%), and tuberculosis (12.3%).5 The median hospitalisation cost was ¥15 534.5 (interquartile range, 9447.2-29 000.0).5 Additionally, an increasing trend in admissions for spontaneous pneumothorax has been observed in England, highlighting the prevalence of the disease and its associated healthcare burden.2

Management of pleural diseases involves various diagnostic and therapeutic procedures that extend beyond the pleural space to include the airway and lung parenchyma. Whether closed or open, these procedures substantially contribute to the overall healthcare burden. However, information about pleural diseases and related respiratory procedures in Hong Kong remains limited, highlighting the need for contemporary, population-based epidemiological data.

The Hospital Authority, which provides healthcare services to over 90% of Hong Kong’s population, maintains extensive healthcare databases. These include the Clinical Management System (CMS) and the Clinical Data Analysis and Reporting System (CDARS), which capture a wide range of longitudinal clinical data. Examples include hospital discharge records, diagnosis and procedure codes for each hospitalisation episode, radiological findings, and laboratory parameters, particularly blood and pleural fluid analyses. This comprehensive dataset provides valuable insights into the burden of pleural diseases and accurately represents the local population.

Before analysing diseases and procedures using administrative data, it is essential to validate the accuracy of diagnosis and procedure codes within the healthcare database. These codes are typically entered by attending physicians, interventionists, or surgeons performing the procedures, which suggests a high degree of reliability. However, no prior local validation study has been conducted. Therefore, we aimed to assess whether diagnosis codes for pleural diseases and procedure codes for relevant respiratory procedures are accurately recorded for each hospitalisation episode within the Hospital Authority systems.

Methods

This retrospective, observational validation study of diagnosis and procedure codes utilised data from a territory-wide healthcare database in Hong Kong. Clinical data were obtained from CDARS, provided by the Hospital Authority. Hospitalisation episodes with the targeted diagnosis and procedure codes between 1 January 2013 and 31 December 2022 were retrieved from the system. Each observation represented a hospitalisation episode rather than a unique patient, and no patient recruitment was involved.

Diagnosis and procedure codes were defined using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). The basic format of an ICD-9-CM code consists of three to six digits. The Hospital Authority further extends these codes with additional characters after the decimal point to specify particular diagnoses or procedures within an ICD-9-CM code subgroup (‘subcodes’). These subcodes are displayed in CDARS but are not typically accessible to frontline CMS users. All hospitalisation episodes in acute hospitals with a discharge diagnosis code of pneumothorax (codes starting with 512), pleural effusion (codes starting with 012, 197.2, 220.4, 510, or 511), traumatic pneumothorax or haemothorax (trauma-related pleural events, codes starting with 860), or procedure codes for relevant respiratory procedures (codes starting with 33 or 34) were retrieved, regardless of their position in the coding list. Hospitalisation episodes for patients younger than 18 years old or from paediatric departments were excluded from subsequent validation analyses. Uninterrupted hospitalisation episodes following the index episodes, including those in acute or convalescent hospitals with the same diagnosis code of interest, were also excluded, as these may represent duplicate entries for the same clinical event. The remaining hospitalisation episodes after exclusions were grouped as the main cohort.

Manual verification of a proportion of the retrieved diagnosis and procedure codes, down to the subcode level, was conducted to ensure data accuracy. The main cohort was first filtered to include only hospitalisation episodes at the authors’ affiliated institution, Prince of Wales Hospital (PWH), forming the PWH cohort. A maximum of 50 hospitalisation episodes for each diagnosis or procedure code were randomly extracted from the PWH cohort to estimate the true positive predictive values (PPVs) within a 13% margin of error at a 95% confidence interval (95% CI). This precision level was chosen pragmatically to balance statistical rigour with the substantial manual effort required for chart review in this validation study. Prince of Wales Hospital is a tertiary care centre with a complex case mix, encompassing a wide range of pleural diseases and advanced respiratory procedures. Within the PWH cohort, the types of pleural disease (pleural effusion, pneumothorax, and trauma-related pleural events) and their underlying aetiologies (eg, non-tuberculous infection, tuberculosis, and malignancy) were determined through retrospective review of clinical notes, discharge summaries, radiological findings, and blood and pleural fluid analysis results using the CMS. Procedure codes were verified by reviewing procedure records within the corresponding hospitalisation episodes. All cases were independently reviewed by two board-certified respiratory physicians. Discrepancies were resolved through joint case review until consensus was reached. Coding accuracy was expressed as PPVs with 95% CIs. The PPV was calculated by dividing the number of true positives (ie, hospitalisation episodes in the PWH cohort where diagnosis and procedure codes were confirmed by manual verification) by the total number of true positives and false positives (ie, episodes where codes were rejected upon manual review). The 95% CI was calculated using the exact binomial method.

We hypothesised that the PPVs for the accuracy of diagnosis and procedure codes would be equal to or greater than 0.700, a commonly used threshold for successful validation.6 7 8 The primary endpoint was the determination of PPVs for the listed diagnosis and procedure codes. All statistical analyses were performed using Python (version 3.12.6).

Results

A total of 26 757 non-traumatic pneumothorax, 218 018 non-traumatic pleural effusion, and 1269 trauma-related pleural events were retrieved from CDARS between 2013 and 2022. Following the exclusion of paediatric patients and uninterrupted hospitalisation episodes, 20 888 non-traumatic pneumothorax, 199 323 non-traumatic pleural effusion, and 1127 trauma-related pleural events remained in the main cohort. Of these, 2451 (11.7%), 24 938 (12.5%), and 251 (22.3%) diagnosis codes for non-traumatic pneumothorax, non-traumatic pleural effusion, and trauma-related pleural events, respectively, were identified from PWH (Fig). Additionally, 185 154 and 106 450 relevant respiratory procedures with ICD-9-CM codes starting with 33 and 34, respectively, were retrieved. After exclusions, 181 770 and 101 336 procedure codes remained, of which 16 078 (8.8%) and 17 299 (17.1%) procedure codes, respectively, were identified from PWH (Fig). Tables 1, 2, and 3 list the diagnosis codes included in the validation analysis for non-traumatic pneumothorax (Table 1), non-traumatic pleural effusion (Table 2) and trauma-related pleural events (Table 3), while Tables 4 and 5 present the procedure codes starting with ‘33’ and ‘34’, respectively; the breakdown of hospitalisation episodes retrieved using these codes, and the numbers remaining after screening, are also shown.

Figure. Number of diagnosis and procedure codes identified, from retrieval in the Clinical Data Analysis and Reporting System to inclusion in the Prince of Wales Hospital cohort

Table 1. Diagnosis codes for non-traumatic pneumothorax included in the validation analysis

Table 2. Diagnosis codes for non-traumatic pleural effusion included in the validation analysis

Table 3. Diagnosis codes for trauma-related pleural events included in the validation analysis

Table 4. Procedure codes starting with 33 included in the validation analysis

Table 5. Procedure codes starting with 34 included in the validation analysis

The overall PPVs (95% CIs) for pneumothorax, pleural effusion, trauma-related pleural events, and all diagnosis codes were 0.853 (0.787-0.904), 0.928 (0.903-0.948), 0.957 (0.907-0.981), and 0.919 (0.898-0.936), respectively. The overall PPVs (95% CIs) for procedure codes starting with 33, starting with 34, and for all procedure codes were 0.932 (0.913-0.948), 0.933 (0.916-0.948), and 0.933 (0.920-0.944), respectively.

The PPVs for diagnosis codes related to pneumothorax, pleural effusion, and trauma-related pleural events were all equal to or greater than 0.700, with ranges of 0.700-1.000, 0.833-1.000, and 0.857-1.000, respectively. The lowest PPV (95% CI) was observed for postoperative pneumothorax (procedure code 512.1.2) at 0.700 (0.560-0.812). The highest PPVs were seen for iatrogenic pneumothorax (procedure code 512.1.0) and postoperative haemothorax (procedure code 511.8.7), both at 1.000, with 95% CIs of 0.933-1.000 and 0.762-1.000, respectively. The reasons for false-positive diagnosis codes are summarised in online supplementary Tables 1 to 3, with inappropriate coding of alternative diseases being the most common cause.

The PPVs for procedure codes starting with 33 ranged from 0.700 to 1.000. Procedure codes starting with 34 met the PPV benchmark, except for 34.04.3 (indwelling pleural catheterisation) and 34.09.3 (drainage of the pleural cavity, open). The reasons for false-positive procedure codes are listed in online supplementary Tables 4 and 5, with inappropriate coding of alternative but similar procedures being the most common cause. The low PPV for procedure code 34.04.3 (indwelling pleural catheterisation) arose from its misuse to represent non-tunnelled pleural catheter insertion, or to document the presence of an indwelling pleural catheter (IPC) inserted during prior hospitalisations. Procedure code 34.09.3 (drainage of the pleural cavity, open) failed to meet the PPV benchmark because it was misused to represent closed pleural drainage by drain insertion, rather than an open procedure.

Discussion

This study is the first to validate diagnosis and procedure codes for pleural diseases using a healthcare database in Hong Kong. All diagnosis codes for pleural diseases and the majority of procedure codes for relevant respiratory procedures met the PPV benchmark of 0.700 or higher. Only procedure codes 34.04.3 (indwelling pleural catheterisation) and 34.09.3 (drainage of the pleural cavity, open) failed to meet the validation criteria.

In 2008, the Hong Kong Thoracic Society reported the burden of lung disease in Hong Kong using local data from various governmental sources; however, pleural diseases were not included in the report.9 Over the subsequent decade, the incidence rates of individual pleural diseases were studied in Hong Kong. However, these studies were limited in scope as they focused on single pleural diseases (eg, empyema,10 11 12 malignant mesothelioma,13 and spontaneous pneumothorax14) or were restricted to single-centre settings.10 11

There is a pressing need for contemporary, population-based epidemiological data covering various pleural diseases in Hong Kong. A recent local survey highlighted heterogeneous practices in the management of pleural diseases among medical clinicians and reflected a lack of awareness and dedicated service infrastructure for pleural diseases.15 Given the rapid advancements in diagnostic strategies and therapeutic options for pleural diseases,16 an accurate and up-to-date assessment of their clinical burden is crucial. Such data provide a foundation for guiding future research, benchmarking healthcare standards in Hong Kong against those of other countries, informing the allocation of future healthcare resources for pleural diseases, and estimating the workload of healthcare professionals managing these conditions. All such service developments should be based on an accurate estimation of the current burden and projected future demand. The use of existing healthcare databases offers a practical approach; however, relevant diagnosis and procedure codes must first be validated. A similar research pathway was followed by Arnold et al,17 who validated diagnosis codes prior to assessing the epidemiology of pleural empyema in English hospitals.17 18

Nearly all PPVs of the diagnosis and procedure codes studied exceeded the benchmark of 0.700. Notably, PPVs for procedure codes were generally higher than those for diagnosis codes. This is because diagnosis codes can be carried over from previous hospitalisation episodes, enabling attending physicians to select active or inactive diagnosis codes regardless of their relevance to the current episode. In contrast, procedure codes cannot be carried over and must be entered manually to reflect procedures performed during the corresponding hospitalisation episode. This requirement contributes to the higher accuracy for procedure codes.

The PPV for procedure code 34.04.3 (indwelling pleural catheterisation) was unexpectedly low due to misuse. The absence of a specific diagnosis code indicating the presence of an IPC, combined with the inclusion of the term ‘pleural’ in the code description, contributed to its incorrect use, particularly during searches for non-tunnelled pleural catheter insertion. Updated diagnosis codes to indicate the status ‘presence of IPC’, or a new procedure code for ‘pleural fluid drainage using an existing IPC’, would accurately reflect the clinical scenario. Once available, such codes should be validated before any analyses of IPC use in territory-wide healthcare databases. Alternatively, establishing a clinical registry for IPC use could facilitate more accurate tracking of patients with both malignant and benign causes of pleural effusion.

Some diagnosis codes (eg, hydrothorax related to dialysis [511.8.3] and hydrothorax as complication of peritoneal dialysis [551.8.8]) and procedure codes (eg, video-assisted thoracoscopy for haemostasis [34.09.4] and injection into thoracic cavity [34.92.0]) were used in other hospitals but not at PWH; therefore, they could not be validated in this study. Within the PWH cohort, alternative diagnosis or procedure codes were used and validated. However, the number of hospitalisation episodes associated with these codes was small, and their impact would be minimal in a territory-wide healthcare data analysis where similar codes are grouped together.

Duplication of subcodes for similar diagnoses or procedures was also noted. Several diagnoses and procedures were represented by different codes, including:

Hydrothorax related to dialysis (511.8.3) and hydrothorax as complication of peritoneal dialysis (511.8.8);

Fibreoptic bronchoscopy (33.22.0) and bronchoscopy (33.23.0);

Endoscopic ultrasonography of bronchus (33.23.3) and endobronchial ultrasonography (33.23.5);

Closed endoscopic biopsy of bronchus (33.24.0), bronchoscopic biopsy (33.24.1), fibreoptic bronchoscopy with biopsy (33.24.2), and flexible bronchoscopy with biopsy of bronchus (33.24.7);

Lung biopsy via endoscopy (33.27.0), bronchoscopic biopsy under fluoroscopic guidance (33.27.1), and flexible bronchoscopy with biopsy of lung (33.27.2);

Video-assisted thoracoscopy for haemostasis (34.09.4) and video-assisted thoracoscopy, haemostasis (34.21.5); and

Chemical pleurodesis (34.92.1) and pleurodesis, chemical (34.92.2).

Researchers should be reminded to search all relevant diagnosis and procedure codes to minimise the risk of missing data for specific diseases or procedures during code searches. In the long term, reconciling similar codes may help reduce ambiguity and improve data consistency.

Strengths and limitations

This study has several strengths, notably its status as the first validation study conducted using a large healthcare database in Hong Kong. It successfully validated codes for a wide range of pleural diseases and respiratory procedures, thereby laying the foundation for future epidemiological research. However, several limitations should be acknowledged. Not all codes could be adequately validated due to their small case volumes in the PWH cohort. For example, codes for Meigs’ syndrome (220.4), traumatic pneumothorax with open wound into thorax (860.1), and traumatic haemothorax with open wound into thorax (860.3) had small numbers even in the overall cohort, and some codes were duplicated. As such, future research incorporating patient searches based on these diagnosis and procedure codes should take these limitations into account. The single-centre nature of the study represents a further limitation, as disease patterns and coding practices may vary across district general hospitals.

Conclusion

This is the first validation study of diagnosis codes for pleural diseases and procedure codes for relevant respiratory procedures using a territory-wide healthcare database in Hong Kong. All diagnosis codes and the majority of procedure codes demonstrated high PPVs, indicating accurate coding. Given the emergence of new respiratory procedures, diagnosis and procedure codes should be regularly updated. The removal or consolidation of duplicated subcodes within the Hospital Authority system is also necessary to facilitate accurate future research and analysis using clinical codes. Further evaluation and harmonisation of coding practices across different hospitals would be beneficial. These measures will pave the way for future territory-wide studies and enable monitoring of the overall burden of pleural diseases in Hong Kong.

Author contributions

Concept or design: KKP Chan.
Acquisition of data: KKP Chan, TCC Ng, CY Sze, KC Ling.
Analysis or interpretation of data: KKP Chan, TCC Ng, CY Sze, KC Ling.
Drafting of the manuscript: KKP Chan.
Critical revision of the manuscript for important intellectual content: KKP Chan, TCC Ng, C Chan, CHY Lau, SWT Ho, JKC Ng, RLP Lo, WH Yip, JCL Ngai, KW To, FWS Ko, DSC Hui.

All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.

Conflicts of interest

All authors have disclosed no conflicts of interest.

Acknowledgement

The authors thank Prof Terry CF Yip from the Department of Medicine and Therapeutics of The Chinese University of Hong Kong for providing statistical support.

Funding/support

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Ethics approval

This research was approved by the Joint Chinese University of Hong Kong–New Territories East Cluster Clinical Research Ethics Committee, Hong Kong (Ref No.: 2022.031). The requirement for patient consent was waived by the Committee due to the retrospective nature of the study.

Supplementary material

The supplementary material was provided by the authors and some information may not have been peer reviewed. Accepted supplementary material will be published as submitted by the authors, without any editing or formatting. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by the Hong Kong Academy of Medicine and the Hong Kong Medical Association. The Hong Kong Academy of Medicine and the Hong Kong Medical Association disclaim all liability and responsibility arising from any reliance placed on the content.

References

1. Bodtger U, Hallifax RJ. Epidemiology: why is pleural disease becoming more common? In: Maskell NA, Laursen CB, Lee YCG, et al, editors. Pleural Disease. Vol 87. Schweiz, Switzerland: European Respiratory Society; 2020: 1-12. Crossref

2. Hallifax RJ, Goldacre R, Landray MJ, Rahman NM, Goldacre MJ. Trends in the incidence and recurrence of inpatient-treated spontaneous pneumothorax, 1968-2016. JAMA 2018;320:1471-80. Crossref

3. Light RW. Pleural effusions. Med Clin North Am 2011;95:1055-70. Crossref

4. Taghizadeh N, Fortin M, Tremblay A. US hospitalizations for malignant pleural effusions: data from the 2012 National Inpatient Sample. Chest 2017;151:845-54. Crossref

5. Tian P, Qiu R, Wang M, et al. Prevalence, causes, and health care burden of pleural effusions among hospitalized adults in China. JAMA Netw Open 2021;4:e2120306. Crossref

6. Kwok WC, Tam TC, Sing CW, Chan EW, Cheung CL. Validation of diagnostic coding for bronchiectasis in an electronic health record system in Hong Kong. Pharmacoepidemiol Drug Saf 2023;32:1077-82. Crossref

7. Ye Y, Hubbard R, Li GH, et al. Validation of diagnostic coding for interstitial lung diseases in an electronic health record system in Hong Kong. Pharmacoepidemiol Drug Saf 2022;31:519-23. Crossref

8. Kwok WC, Tam TC, Sing CW, Chan EW, Cheung CL. Validation of diagnostic coding for asthma in an electronic health record system in Hong Kong. J Asthma Allergy 2023;16:315-21. Crossref

9. Chan-Yeung M, Lai CK, Chan KS, et al. The burden of lung disease in Hong Kong: a report from the Hong Kong Thoracic Society. Respirology 2008;13 Suppl 4:S133-65. Crossref

10. Chan KP, Ng SS, Ling KC, et al. Phenotyping empyema by pleural fluid culture results and macroscopic appearance: an 8-year retrospective study. ERJ Open Res 2023;9:00534-2022. Crossref

11. Tsang KY, Leung WS, Chan VL, Lin AW, Chu CM. Complicated parapneumonic effusion and empyema thoracis: microbiology and predictors of adverse outcomes. Hong Kong Med J 2007;13:178-86.

12. Chan KP, Ma TF, Sridhar S, Lam DC, Ip MS, Ho PL. Changes in etiology and clinical outcomes of pleural empyema during the COVID-19 pandemic. Microorganisms 2023;11:303. Crossref

13. Chang KC, Leung CC, Tam CM, Yu WC, Hui DS, Lam WK. Malignant mesothelioma in Hong Kong. Respir Med 2006;100:75-82. Crossref

14. Chan JW, Ko FW, Ng CK, et al. Management and prevention of spontaneous pneumothorax using pleurodesis in Hong Kong. Int J Tuberc Lung Dis 2011;15:385-90.

15. Lui MM, Yeung YC, Ngai JC, et al. Implementation of evidence on management of pleural diseases: insights from a territory-wide survey of clinicians in Hong Kong. BMC Pulm Med 2022;22:386. Crossref

16. Lui MM, Lee YC. Twenty-five years of respirology: advances in pleural disease. Respirology 2020;25:38-40. Crossref

17. Arnold DT, Hamilton FW, Morris TT, et al. Epidemiology of pleural empyema in English hospitals and the impact of influenza. Eur Respir J 2021;57:2003546. Crossref

18. Hamilton F, Arnold D. Accuracy of clinical coding of pleural empyema: a validation study. J Eval Clin Pract 2020;26:79-80. Crossref

Clone of Validation of diagnosis codes for pleural diseases and procedure codes for relevant respiratory procedures in a healthcare database in Hong Kong: a single tertiary centre study

Hong Kong Med J 2026;32:Epub 30 Jan 2026

https://doi.org/10.12809/hkmj2412275

ORIGINAL ARTICLE

Validation of diagnosis codes for pleural diseases and procedure codes for relevant respiratory procedures in a healthcare database in Hong Kong: a single tertiary centre study

¹ Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China

² Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China

Corresponding author: Prof David SC Hui (dschui@cuhk.edu.hk)

Full paper in PDF

Abstract

New knowledge added by this study

This is the first validation study of clinical codes (International Classification of Diseases, Ninth Revision, Clinical Modification) for pleural diseases and relevant procedures in Hong Kong.
All diagnosis codes and most procedure codes were successfully validated.
Duplication of codes for similar diagnoses or procedures was identified.

Implications for clinical practice or policy

With the emergence of new respiratory procedures, diagnosis and procedure codes should be updated regularly.
Removal or consolidation of duplicated subcodes in the Hospital Authority system is necessary to facilitate accurate future research and analysis using clinical codes.
Researchers should be reminded to search all relevant diagnosis and procedure codes to minimise missing data when identifying specific diseases or procedures.

[Abstract in Chinese]

Introduction

Methods

Results

Figure. Number of diagnosis and procedure codes identified, from retrieval in the Clinical Data Analysis and Reporting System to inclusion in the Prince of Wales Hospital cohort

Table 1. Diagnosis codes for non-traumatic pneumothorax included in the validation analysis

Table 2. Diagnosis codes for non-traumatic pleural effusion included in the validation analysis

Table 3. Diagnosis codes for trauma-related pleural events included in the validation analysis

Table 4. Procedure codes starting with ‘33’ included in the validation analysis

Table 5. Procedure codes starting with ‘34’ included in the validation analysis