Development and optimisation strategies for a nomogram-based predictive model of malignancy risk in thyroid nodules

Hong Kong Med J 2026;32:Epub 30 Jan 2026
© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE (HEALTHCARE IN CHINA)
Development and optimisation strategies for a nomogram-based predictive model of malignancy risk in thyroid nodules
Peng He, MD, PhD1 #; Yu Liang, MD2 #; Yuan Zou, MD1; Zhou Zou, BM3; Bo Ren, MD1; Shan Peng, MD4; Hongmei Yuan, MD, PhD1; Qin Chen, MD2
1 Department of Ultrasound Medicine and Ultrasonic Medical Engineering Key Laboratory of Nanchong City, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
2 Department of Ultrasound, Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
3 Department of Orthopedics, Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
4 Department of Rehabilitation, Second Clinical College of North Sichuan Medical College, Nanchong, China
# Equal contribution
 
Corresponding author: Dr Yuan Zou (zouyuanxiao@163.com)
 
 Full paper in PDF
 
Abstract
Introduction: This study aimed to develop and validate a clinical prediction model to assist radiologists in optimising the diagnostic classification of the Chinese Thyroid Imaging Reporting and Data System (C-TIRADS).
 
Methods: A total of 1659 patients from two hospitals were included in this study. The derivation cohort comprised 909 patients for model development and internal validation, while 750 patients formed the external validation cohort. A binary logistic regression model was constructed. Model performance in the derivation set was evaluated using receiver operating characteristic (ROC) curves and visualised with a nomogram. In the external validation set, ROC and calibration curves were used to assess discrimination and calibration.
 
Results: The original C-TIRADS category, abnormal cervical lymph node sonographic findings, and changes in thyroid nodule size emerged as significant predictors of C-TIRADS optimisation. The optimised nomogram demonstrated an area under the ROC curve (AUC) of 0.730 (95% confidence interval=0.697-0.762), with a sensitivity of 63.2%, specificity of 74.9%, and overall accuracy of 67.7% for predicting optimisation. Using probability thresholds of ≥60% to recommend an upgrade and <30% to recommend a downgrade, the calibration curve showed good agreement, and decision curve analysis demonstrated a favourable net clinical benefit. External validation confirmed excellent discrimination (AUC=0.865; 95% confidence interval=0.839-0.891).
 
Conclusion: An optimised C-TIRADS model that integrates imaging features of thyroid nodules with clinical risk factors may aid radiologists in improving the diagnostic efficiency and clinical utility of the TIRADS classification.
 
 
New knowledge added by this study
  • This is the first study to integrate clinical risk factors with imaging features to optimise the Chinese Thyroid Imaging Reporting and Data System (C-TIRADS) classification.
  • This work established a risk threshold–based decision-making framework to guide C-TIRADS classification adjustments.
  • External validation demonstrated the model’s generalisability across diverse clinical settings.
Implications for clinical practice or policy
  • Our model improved diagnostic precision through the integration of imaging and clinical risk factors.
  • This research has the potential to optimise resource allocation and reduce interobserver diagnostic variability.
 
 
Introduction
Thyroid nodules are a common clinical finding, with a prevalence of approximately 4% to 7% in the general population, and are most often detected by ultrasonography.1 2 Although most thyroid nodules are benign, distinguishing malignant from benign nodules remains a clinical priority to avoid unnecessary procedures and ensure timely intervention.3 To standardise risk stratification, various Thyroid Imaging Reporting and Data Systems (TIRADS) have been developed,4 5 including the ACR-TIRADS (American College of Radiology),6 the K-TIRADS (Korean Society of Thyroid Radiology),7 and the European Thyroid Association.8 Recognising the need for a system tailored to the Chinese healthcare context, the Chinese Artificial Intelligence Alliance for Thyroid and Breast Ultrasound proposed the Chinese TIRADS (C-TIRADS) in 2021.2 However, existing TIRADS models primarily focus on sonographic characteristics and often overlook relevant clinical risk factors (eg, patient age, sex, and cervical lymph node [LN] involvement).9 In clinical practice, radiologists frequently incorporate such clinical information into their assessments, contributing to inconsistency and variability in TIRADS classification.
 
Papillary thyroid carcinoma accounts for approximately 80% to 90% of all thyroid cancers and is typically characterised by indolent behaviour.10 11 A substantial proportion of new cases involve papillary thyroid microcarcinoma, defined as tumours measuring less than 10 mm in diameter, which generally carry a favourable clinical prognosis.12 Increasing recognition of the indolent nature of papillary thyroid microcarcinoma has raised concerns regarding potential overdiagnosis and overtreatment. However, current risk stratification strategies that rely solely on imaging features may either overestimate or underestimate malignancy risk, depending on the patient’s broader clinical context. Approaches that incorporate clinical risk factors into TIRADS classification could address these limitations and enhance diagnostic accuracy, supporting more individualised patient management.
 
This study aimed to develop and externally validate a predictive model that integrates both imaging characteristics and clinical risk factors to refine the C-TIRADS classification system. To our knowledge, this is the first nomogram-based model to incorporate clinical risk factors into the C-TIRADS framework. The tool is designed to assist radiologists in improving diagnostic consistency and supporting more informed and individualised clinical decision making in the management of thyroid nodules.
 
Methods
Study design and population
This retrospective diagnostic study included patients with thyroid nodules who underwent surgical resection at two tertiary hospitals in China. The derivation cohort comprised patients treated at Sichuan Provincial People’s Hospital from January to December 2022, while the external validation cohort was drawn from Affiliated Hospital of North Sichuan Medical College during the same period. Inclusion criteria were: (1) thyroid nodules confirmed by postoperative pathology and (2) preoperative ultrasonography of the thyroid and cervical LNs with complete imaging and clinical records. Exclusion criteria were: (1) unclear pathological diagnosis; (2) incomplete clinical data; or (3) poor-quality ultrasound images.
 
Imaging evaluation and classification
Two junior radiologists, blinded to clinical and pathological information, independently classified all nodules according to the C-TIRADS criteria. Subsequently, two senior radiologists re-evaluated the cases and adjusted the classifications based on additional clinical risk factors, including patient demographics and cervical LN findings. Any modification from the initial C-TIRADS classification was defined as ‘classification optimisation’ (*C-TIRADS), encompassing both upgrades and downgrades.
 
Data collection
Structured data collection forms were used to record clinical and sonographic variables. The collected data included patient sex, age, nodule size, number of nodules, C-TIRADS classification, and the presence of abnormal cervical LNs on ultrasonography.
 
Predictor variables
Sonographic features that directly determine the C-TIRADS score (such as solidity, echogenicity, aspect ratio, microcalcification, and margin irregularity) were not included independently in the multivariable analysis to avoid collinearity. Based on clinical relevance and univariate regression analysis, six predictors were selected for model development, namely, patient sex, age-group (≤40, 40-60, and >60 years),13 14 nodule size, number of nodules (single vs multiple), presence of abnormal cervical LNs, and original C-TIRADS classification.
 
Model development and internal validation
A binary logistic regression model was developed using the derivation cohort from Sichuan Provincial People’s Hospital (n=909). For categorical variables with more than two levels, dummy variables were created. The C-TIRADS category 5 was used as the reference group as it represents the highest level of suspicion and the most definitive management pathway (surgical resection), making it an appropriate clinical baseline to estimate relative malignancy risk and the need for reclassification. Model performance in the derivation cohort was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC), and calibration was assessed by comparing predicted probability (PP) with observed outcomes using calibration plots.
 
We emphasise that the primary outcome variable for model training was the pathological diagnosis (binary: malignant vs benign). The C-TIRADS optimisation, defined as upgrading or downgrading the original category based on PP thresholds, was a post-model clinical decision rule applied to the model output, not the outcome used for model development.
 
Internal validation was performed using bootstrap resampling with 1000 samples to obtain bias-corrected estimates of model performance and 95% confidence intervals (95% CIs). A fixed random seed was set to ensure reproducibility. The bias-corrected C-statistic was 0.728, compared with the original apparent performance of 0.730 (a difference of 0.002), confirming the model’s stable discriminative ability (online supplementary Table 1).
 
External validation
The final model was applied to the external cohort from Affiliated Hospital of North Sichuan Medical College (n=750) to evaluate its generalisability. Model discrimination was evaluated by calculating the AUC in the validation set, and calibration was assessed using calibration curves.
 
Nomogram construction
A nomogram was developed based on the final multivariable regression model to provide a visual tool for clinical application. Each predictor was assigned a score, and the total score corresponded to the PP of C-TIRADS classification optimisation.
 
Decision curve analysis and risk thresholds
Decision curve analysis and clinical impact curves were used to evaluate the clinical utility of the nomogram by quantifying the net benefit across a range of threshold probabilities. Specifically, the nomogram generates a PP indicating whether a nodule’s original C-TIRADS classification should be modified after integrating clinical information. For clinical decision making, we pre-specified probability cut-offs: PP ≥60% (upgrade), PP <30% (downgrade), and PP ≥30% but <60% (unchanged). Based on these thresholds, the model’s recommendations were translated into optimised C-TIRADS categories, which were then compared with radiologists’ optimisation decisions and surgical pathology findings, as appropriate. These thresholds are reported in the Results section and were applied consistently across all performance tables
 
Model performance evaluation
To ensure consistent ROC analysis, all AUCs were calculated using continuous PPs rather than ordinal risk categories. For the original C-TIRADS system, the five-level ordinal classification was transformed into a continuous malignancy probability score using proportional-odds (ordinal logistic) regression. This standard statistical method was employed to model the ordered nature of the C-TIRADS categories and to derive a continuous probability of malignancy for each category, enabling fair comparison in ROC analysis against other models. For the optimised *C-TIRADS system, PPs were directly obtained from the final multivariable logistic regression model. The ROC curves and corresponding AUCs were constructed using these continuous predictions.
 
Statistical analysis
Statistical analyses and data visualisation were performed using SPSS (Windows version 26.0; IBM Corp, Armonk [NY], United States) and RStudio (version 2022). Categorical variables were reported as number of cases or percentages, with group comparisons conducted using Chi squared test or Fisher’s exact test, as appropriate. Multivariable logistic regression analysis was conducted to identify independent predictors. Model discrimination was evaluated using ROC curves, while calibration curves were used to assess model accuracy. Clinical decision and impact curves were established to assess practical clinical utility. A two-tailed P value of <0.05 was considered statistically significant.
 
Results
Baseline characteristics
All models were trained to predict pathological malignancy. The optimised *C-TIRADS classifications presented here were derived by applying predefined probability thresholds to the model’s malignancy predictions.
 
A total of 1659 patients with thyroid nodules were included in the study, comprising 909 patients in the derivation cohort and 750 in the external validation cohort. In the derivation cohort, 71.8% of patients were women, and the majority (90.8%) had nodules measuring ≤30 mm. Approximately 81.7% of patients showed no abnormal cervical LNs on ultrasonography. The rate of C-TIRADS optimisation was 60.6%. In the external validation cohort, similar distributions were observed, with a higher proportion of nodules >30 mm (Table 1).
 

Table 1. Patient and nodule characteristics (n=1659)
 
Univariate analysis
Univariate binary regression analysis revealed that several variables were either significantly associated (P<0.05) or showed a trend towards association (0.05 < P < 0.1) with C-TIRADS optimisation. These variables included patient sex, age, nodule size (10-30 mm), number of nodules, solid composition, blurred margins, aspect ratio >1, abnormal cervical LNs, and C-TIRADS category (Table 2 and online supplementary Table 2).
 

Table 2. Predictor distribution and univariate logistic regression odds ratios for malignancy (n=909)
 
Multivariable model development
A multivariable binary logistic regression model was developed to identify independent predictors associated with C-TIRADS optimisation. Six predictors were independently associated with the outcome. The key predictors of C-TIRADS optimisation were male sex, age 40 to 60 years, thyroid nodule size (per 1-mm increase), multiple thyroid nodules, presence of abnormal cervical LNs, and original C-TIRADS 4A category (online supplementary Table 3). A nomogram model was constructed based on these six independent predictors (Fig 1).
 

Figure 1. Nomogram prediction model to aid radiologists in optimising the Chinese Thyroid Imaging Reporting and Data System classification
 
Model performance in the derivation cohort
The model demonstrated good discrimination, with an AUC of 0.730 (95% CI=0.697-0.762) in the derivation cohort (online supplementary Fig a). Internal validation using 1000 bootstrap samples yielded a bias-corrected C-statistic of 0.728, indicating stable model performance (online supplementary Table 1). Calibration curves showed good agreement between PPs and observed outcomes (online supplementary Fig b).
 
Diagnostic thresholds were evaluated to stratify risk. A PP of ≥60% or <30% was considered indicative of a high likelihood of classification change: a PP of ≥60% suggested upgrading, while a PP of <30% suggested downgrading; PPs between 30% and 60% indicated that the classification was likely to remain unchanged. A detailed summary of sensitivity, specificity, and overall accuracy across these thresholds is presented in online supplementary Table 4.
 
External validation
When applied to the external cohort, the model achieved an AUC of 0.865 (95% CI=0.839-0.891) [online supplementary Fig c], demonstrating excellent generalisability. Calibration plots again confirmed close agreement between predicted and observed probabilities (online supplementary Fig d). At the 60% probability threshold, sensitivity was 85.0%, specificity was 69.0%, and overall accuracy was 79.7% in the external validation cohort. Diagnostic performance metrics across various risk thresholds of the final prediction model were analysed in the external validation population (online supplementary Table 5).
 
Clinical utility
Decision curve analysis (Fig 2a) demonstrated that the nomogram model provided greater net clinical benefit across a wide range of threshold probabilities compared with treating all or no patients. The clinical impact curve (Fig 2b) showed that the number of true positives closely approximated the predicted number across relevant thresholds. The observed distribution of histopathological outcomes was as follows: in the derivation cohort, 769 nodules (84.6%) were confirmed malignant and 140 (15.4%) were benign; in the validation cohort, 434 nodules (57.9%) were malignant and 316 (42.1%) were benign.
 

Figure 2. Comparison of the diagnostic efficacy of the Chinese Thyroid Imaging Reporting and Data System (C-TIRADS) and optimised C-TIRADS (*C-TIRADS) in the diagnosis of benign and malignant thyroid nodules. (a) Clinical decision curve of the predictive model for radiologist-optimised *C-TIRADS classification in the derivation cohort. (b) Comparison of the diagnostic efficacy of C-TIRADS and *C-TIRADS for the diagnosis of benign and malignant thyroid nodules in the derivation cohort. (c) Clinical impact curves of the predictive model for radiologist-optimised C-TIRADS classification in the derivation cohort, showing the number of patients classified as high risk (solid curve) and the number of true positives among them (dashed curve) across probability thresholds. (d) Comparison of the diagnostic efficacy of C-TIRADS and *C-TIRADS for the diagnosis of benign and malignant thyroid nodules in the validation cohort
 
Comparison of diagnostic efficacy between the original C-TIRADS and optimised C-TIRADS classifications demonstrated superior performance of the optimised model in both the derivation and validation cohorts (Fig 2c and d, respectively). The optimised classification achieved higher AUC values for differentiating benign from malignant nodules (AUC=0.97 vs 0.94 in the derivation cohort; AUC=0.97 vs 0.95 in the external validation cohort). The predictive model tended to improve C-TIRADS classification by upgrading category 4A nodules to category 4B or 4C, reflecting enhanced clinical utility (Table 3 and Fig 2).
 

Table 3. Clinical diagnostic performance of the final predictive model in thyroid nodules (n=1659)
 
Application example of the nomogram model
A 55-year-old man underwent ultrasound examination, which revealed a solid hypoechoic thyroid nodule in the right lobe measuring approximately 7.1 × 6.4 mm2 (Fig 3a). Simultaneously, abnormal LNs were detected on the ipsilateral side of the neck, characterised by indistinct corticomedullary differentiation and suspected microcalcifications (Fig 3b). According to the conventional C-TIRADS system, the nodule was initially classified as category 4B. However, application of the nomogram model yielded a cumulative score of 155 points, corresponding to a malignancy risk of >90%. Based on this result, the TIRADS category was optimised and upgraded to category 5 (Fig 3c). Subsequent histopathological examination confirmed the diagnosis of papillary thyroid microcarcinoma with cervical LN metastasis.
 

Figure 3. Representative case demonstrating the diagnostic utility of the nomogram-assisted model. (a) A 55-year-old man presenting with a solid hypoechoic nodule in the right lobe of the thyroid gland (arrow). (b) Ultrasound revealing abnormal cervical lymph node architecture, characterised by poorly defined corticomedullary borders and suspected microcalcifications (arrow). (c) Application of the predictive model to the thyroid nodule described above. By summing the scores assigned to six individual indicators, the final total score is approximately 155 points, corresponding to a malignancy risk of >90%. According to the optimised classification system, the lesion should be upgraded from category 4B to category 5
 
Discussion
This study retrospectively analysed the sonographic characteristics and clinical risk factors of 1659 thyroid nodules from two large tertiary hospitals in western China, with the aim of optimising the C-TIRADS classification. A predictive model integrating clinical parameters and imaging features was developed and externally validated, demonstrating high diagnostic performance (AUC=0.865 in external validation) and clinical benefit, as evidenced by decision curve analysis.
 
Despite the widespread adoption of various TIRADS frameworks globally,2 4 5 6 7 8 fundamental methodological limitations persist. Current models, such as ACR-TIRADS,6 primarily focus on ultrasound features and rely heavily on consensus-driven rather than statistically validated risk stratification systems.6 15 Although TIRADS demonstrates robust sensitivity in clinical settings, its specificity remains relatively limited.16 Interobserver variability is another key concern—radiologists’ subjective interpretation of ultrasound features can result in inconsistent classification outcomes.17 To address these limitations, various strategies have been proposed, including the integration of artificial intelligence techniques to reduce observer subjectivity.18 19 20 Artificial intelligence has shown promise in matching or even surpassing the specificity achieved by radiologists; however, their clinical implementation remains constrained by challenges in interpretability and low acceptance in routine practice.
 
Integrating clinical risk factors may enhance risk stratification for thyroid nodules, as suggested by a growing body of evidence.21 In alignment with this, our study incorporated clinical variables including patient age, sex, number of nodules, and cervical LN status into the predictive model, thereby more accurately reflecting routine clinical diagnostic workflows. While previous studies22 23 24 suggested that male patients with thyroid nodules, particularly those with indeterminate fine-needle aspiration cytology undergoing molecular testing, exhibit a higher malignancy risk,25 our study did not identify a significant difference in thyroid cancer incidence between sexes. This discrepancy may be attributable to methodology differences, as molecular testing was not performed in our cohort and all diagnoses were confirmed through postoperative histopathology. The absence of statistical significance for male sex may reflect population-specific characteristics, such as regional variation in risk factor distribution or age composition.26 These methodological and demographic differences may have attenuated the observed sex-related effect. Nonetheless, male patients in our study were assigned higher risk scores, suggesting an association with malignancy risk, despite the lack of statistical significance.
 
Compared with previous models that primarily focused on intrinsic ultrasound features of thyroid nodules,27 28 29 our nomogram offers a more comprehensive assessment. Although the individual contributions of factors such as sex and age were relatively modest, they reflected subtle clinical patterns often considered by radiologists during decision making. The C-TIRADS optimisation approach demonstrated clear advantages, particularly in reducing unnecessary invasive procedures without compromising diagnostic accuracy, achieving an AUC of 0.972. Furthermore, the new model indicated that a risk threshold of ≥60% favoured the recommendation for C-TIRADS optimisation, whereas a threshold of <30% favoured exclusion. The integration of complex imaging data with clinical information represents a core competency for radiologists.30 With appropriate standardised training and communication frameworks in place, radiologists are well positioned to leverage quantitative metrics generated by the new model into routine diagnostic workflows. This advancement holds promise for improving diagnostic consistency and accuracy in clinical practice.
 
Limitations
This study has several limitations that should be acknowledged. First, the optimisation of the TIRADS classification was influenced by radiologists’ subjective judgement, which may have contributed to interobserver variability. Second, although data collection was conducted by trained junior radiologists, observer variation and the subjective nature of ultrasound interpretation may have affected the model’s performance.31 Third, internal validation using bootstrap resampling may have overestimated model performance due to potential overfitting; therefore, external validation was essential to confirm generalisability. Fourth, owing to the retrospective design, only a limited set of clinical parameters (eg, sex, age, and cervical LN status) was included. Other relevant factors such as body mass index, environmental exposures, nodule location, family history of thyroid cancer, and radiation exposure history,32 33 were not assessed. Finally, the study cohort exclusively comprised cases confirmed by surgical pathology, resulting in a relatively low proportion of benign lesions, which may have introduced selection bias. The exclusion of patients diagnosed solely by fine-needle aspiration was intentional but may have affected the generalisability of the findings.
 
Future directions
To address the limitations of the present study, future research should aim to standardise the application of TIRADS by adopting unified classification frameworks and implementing regular training programmes to enhance interobserver consistency. Prospective multicentre studies involving broader and more diverse populations are warranted, incorporating a wider range of clinical risk factors to improve predictive accuracy. In particular, data regarding family history, radiation exposure, and other relevant variables across centres would support more comprehensive risk assessment and enhance the generalisability of prediction models. In addition, including patients with fine-needle aspiration–confirmed benign nodules may help achieve a more balanced representation of benign and malignant cases. The development and application of nomogram-based structured training programmes for radiologists could also be explored to further improve diagnostic consistency and clinical utility. While the widespread adoption of a revised classification system will require time, we hope that the findings of this study may contribute to that transition.
 
Conclusion
We developed and externally validated a nomogram-based predictive model that integrates imaging features and clinical risk factors to optimise C-TIRADS classification for thyroid nodules. The model demonstrated good discrimination and calibration across internal and external cohorts, offering a practical tool to assist radiologists in refining diagnostic assessments and improving clinical decision making. Future research incorporating additional clinical variables and prospective validation is warranted to further strengthen the model’s applicability across diverse clinical settings.
 
Author contributions
Concept or design: Y Liang, Y Zou, P He, Q Chen.
Acquisition of data: Y Liang, Y Zou, Z Zou, B Ren.
Analysis or interpretation of data: Y Liang, S Peng, Y Zou.
Drafting of the manuscript: Y Liang, Y Zou, HM Yuan, Z Zou.
Critical revision of the manuscript for important intellectual content: P He, Y Zou.
 
All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
The authors have disclosed no conflicts of interest.
 
Declaration
This manuscript was initially posted as a preprint entitled ‘Development and validation of a clinical prediction model to aid radiologists optimize thyroid C-TIRADS classification’ on Research Square (DOI: 10.21203/rs.3.rs-3831900/v1). After peer feedback and extensive revisions undertaken collaboratively by the author team, the current version has substantially evolved and markedly differs from the preprint version.
 
Funding/support
This research was supported by Sichuan Science and Technology Program (Ref Nos.:2025ZNSFSC1751, 2026YFHZ0039), the University-Industry Collaborative Education Program (Ref No.: 250505236300920), the University-level Project of North Sichuan Medical College (Ref Nos.: CXSY24-06, CBY22-QNA48), and the Hospital-level Projects of the Affiliated Hospital of North Sichuan Medical College, China (Ref Nos.: 210930, 2023-2GC013, 2025LC010). The funders had no role in the study design, data collection/analysis/interpretation, or manuscript preparation.
 
Ethics approval
This research was approved by the Ethics Committee of Sichuan Provincial People’s Hospital (Ref No.: ER20210347) and the Ethics Committee of Affiliated Hospital of North Sichuan Medical College, China (Ref No.: 2021ER436-1). The requirement for informed patient consent was waived by both Committees due to the retrospective nature of the research.
 
Supplementary material
The supplementary material was provided by the authors, and some information may not have been peer reviewed. Accepted supplementary material will be published as submitted by the authors, without any editing or formatting. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by the Hong Kong Academy of Medicine and the Hong Kong Medical Association. The Hong Kong Academy of Medicine and the Hong Kong Medical Association disclaim all liability and responsibility arising from any reliance placed on the content.
 
References
1. Haugen BR, Alexander EK, Bible KC, et al. 2015 American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: the American Thyroid Association guidelines task force on thyroid nodules and differentiated thyroid cancer. Thyroid 2016;26:1-133. Crossref
2. Zhou J, Song Y, Zhan W, et al. Thyroid imaging reporting and data system (TIRADS) for ultrasound features of nodules: multicentric retrospective study in China. Endocrine 2021;72:157-70. Crossref
3. Trimboli P. Complexity in the interpretation and application of multiple guidelines for thyroid nodules: the need for coordinated recommendations for “small” lesions. Rev Endocr Metab Disord 2025;26:223-7. Crossref
4. Park JY, Lee HJ, Jang HW, et al. A proposal for a thyroid imaging reporting and data system for ultrasound features of thyroid carcinoma. Thyroid 2009;19:1257-64. Crossref
5. Horvath E, Majlis S, Rossi R, et al. An ultrasonogram reporting system for thyroid nodules stratifying cancer risk for clinical management. J Clin Endocrinol Metab 2009;94:1748-51. Crossref
6. Tessler FN, Middleton WD, Grant EG, et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): white paper of the ACR TI-RADS Committee. J Am Coll Radiol 2017;14:587-95. Crossref
7. Shin JH, Baek JH, Chung J, et al. Ultrasonography diagnosis and imaging-based management of thyroid nodules: revised Korean Society of Thyroid Radiology consensus statement and recommendations. Korean J Radiol 2016;17:370-95. Crossref
8. Russ G, Bonnema SJ, Erdogan MF, Durante C, Ngu R, Leenhardt L. European Thyroid Association guidelines for ultrasound malignancy risk stratification of thyroid nodules in adults: the EU-TIRADS. Eur Thyroid J 2017;6:225-37. Crossref
9. Chen Z, Wang JJ, Du JB, et al. Development and validation of a dynamic nomogram for predicting central lymph node metastasis in papillary thyroid carcinoma patients based on clinical and ultrasound features. Quant Imaging Med Surg 2025;15:1555-70. Crossref
10. Boucai L, Zafereo M, Cabanillas ME. Thyroid cancer: a review. JAMA 2024;331:425-35. Crossref
11. Zhang J, Xu S. High aggressiveness of papillary thyroid cancer: from clinical evidence to regulatory cellular networks. Cell Death Discov 2024;10:378. Crossref
12. Ma T, Semsarian CR, Barratt A, et al. Rethinking low-risk papillary thyroid cancers <1 cm (papillary microcarcinomas): an evidence review for recalibrating diagnostic thresholds and/or alternative labels. Thyroid 2021;31:1626-38. Crossref
13. Kwong N, Medici M, Angell TE, et al. The influence of patient age on thyroid nodule formation, multinodularity, and thyroid cancer risk. J Clin Endocrinol Metab 2015;100:4434-40. Crossref
14. Pizzato M, Li M, Vignat J, et al. The epidemiological landscape of thyroid cancer worldwide: GLOBOCAN estimates for incidence and mortality rates in 2020. Lancet Diabetes Endocrinol 2022;10:264-72. Crossref
15. Tessler FN, Middleton WD, Grant EG, Hoang JK. Re: ACR Thyroid Imaging, Reporting and Data System (TI-RADS): white paper of the ACR TI-RADS Committee. J Am Coll Radiol 2018;15(3 Pt A):381-2. Crossref
16. Angelopoulos N, Goulis DG, Chrisogonidis I, et al. Diagnostic performance of European and American College of Radiology Thyroid Imaging Reporting and Data System classification systems in thyroid nodules over 20 mm in diameter. Endocr Pract 2025;31:72-9. Crossref
17. Jin Z, Pei S, Shen H, et al. Comparative study of C-TIRADS, ACR-TIRADS, and EU-TIRADS for diagnosis and management of thyroid nodules. Acad Radiol 2023;30:2181-91. Crossref
18. Wildman-Tobriner B, Buda M, Hoang JK, et al. Using artificial intelligence to revise ACR TI-RADS risk stratification of thyroid nodules: diagnostic accuracy and utility. Radiology 2019;292:112-9. Crossref
19. Wu SH, Li MD, Tong WJ, et al. Adaptive dual-task deep learning for automated thyroid cancer triaging at screening US. Radiol Artif Intell 2025;7:e240271. Crossref
20. Trimboli P, Colombo A, Gamarra E, Ruinelli L, Leoncini A. Performance of computer scientists in the assessment of thyroid nodules using TIRADS lexicons. J Endocrinol Invest 2025;48:877-83. Crossref
21. Kobaly K, Kim CS, Mandel SJ. Contemporary management of thyroid nodules. Annu Rev Med 2022;73:517-28. Crossref
22. Xu L, Li G, Wei Q, El-Naggar AK, Sturgis EM. Family history of cancer and risk of sporadic differentiated thyroid carcinoma. Cancer 2012;118:1228-35. Crossref
23. Iglesias ML, Schmidt A, Ghuzlan AA, et al. Radiation exposure and thyroid cancer: a review. Arch Endocrinol Metab 2017;61:180-7. Crossref
24. Saenko V, Mitsutake N. Radiation-related thyroid cancer. Endocr Rev 2024;45:1-29. Crossref
25. Figge JJ, Gooding WE, Steward DL, et al. Do ultrasound patterns and clinical parameters inform the probability of thyroid cancer predicted by molecular testing in nodules with indeterminate cytology? Thyroid 2021;31:1673-82. Crossref
26. Li X, Xing M, Tu P, et al. Urinary iodine levels and thyroid disorder prevalence in the adult population of China: a large-scale population-based cross-sectional study. Sci Rep 2025;15:14273. Crossref
27. Xiao J, Xiao Q, Cong W, et al. Discriminating malignancy in thyroid nodules: the nomogram versus the Kwak and ACR TI-RADS. Otolaryngol Head Neck Surg 2020;163:1156-65. Crossref
28. Xin Y, Liu F, Shi Y, Yan X, Liu L, Zhu J. A scoring system for assessing the risk of malignant partially cystic thyroid nodules based on ultrasound features. Front Oncol 2021;11:731779. Crossref
29. Zhou T, Hu T, Ni Z, et al. Comparative analysis of machine learning-based ultrasound radiomics in predicting malignancy of partially cystic thyroid nodules. Endocrine 2024;83:118-26. Crossref
30. Bluethgen C, Van Veen D, Zakka C, et al. Best practices for large language models in radiology. Radiology 2025;315:e240528. Crossref
31. He Z, Li Y, Zeng W, et al. Can a computer-aided mass diagnosis model based on perceptive features learned from quantitative mammography radiology reports improve junior radiologists’ diagnosis performance? An observer study. Front Oncol 2021;11:773389. Crossref
32. Kim Y, Roh J, Song DE, et al. Risk factors for posttreatment recurrence in patients with intermediate-risk papillary thyroid carcinoma. Am J Surg 2020;220:642-7. Crossref
33. Zhao J, Wen J, Wang S, Yao J, Liao L, Dong J. Association between adipokines and thyroid carcinoma: a meta-analysis of case-control studies. BMC Cancer 2020;20:788. Crossref

A ten-year evaluation of the incidence of obstetric anal sphincter injury with a reduced episiotomy rate

Hong Kong Med J 2026;32:Epub 30 Jan 2026
© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE
A ten-year evaluation of the incidence of obstetric anal sphincter injury with a reduced episiotomy rate
YY Lau, MB, ChB, MRCOG; TW Chau, MB, ChB; WC Tang, MB, BS; Rachel YK Cheung, MD, FHKAM (Obstetrics and Gynaecology); SM Ng, MSc; TM Tso, MSc, BN; Symphorosa SC Chan, MD, FHKAM (Obstetrics and Gynaecology)
Department of Obstetrics and Gynaecology, The Chinese University of Hong Kong, Hong Kong SAR, China
 
Corresponding author: Dr YY Lau (yanyanlau@cuhk.edu.hk)
 
 Full paper in PDF
 
Abstract
Introduction: The role of episiotomy in preventing obstetric anal sphincter injury (OASIS) remains controversial. Liberal use of episiotomy has been reduced locally. This study aimed to review the incidence of OASIS in our unit over the past decade given the reduced episiotomy rate.
 
Methods: A retrospective study was conducted in a single tertiary obstetrics and gynaecology unit. All singleton vaginal deliveries, including normal and instrumental deliveries, between 2012 and 2021 were included. Data were retrieved from the hospital electronic delivery database between July 2022 and June 2023. The degree of OASIS was assessed using the Abdul Sultan classification.
 
Results: In total, 43 732 deliveries were included. The episiotomy rate decreased from 62.8% in 2012 to 44.7% in 2021 (P<0.001), while the OASIS rate increased from 0.3% to 1.4% over the same period (P<0.001). Among nulliparous women, the OASIS rate was significantly lower with episiotomy in both normal vaginal deliveries (0.6% vs 1.7%; P<0.001) and instrumental deliveries with episiotomy than without (1.7% vs 42.9%; P<0.001). Among multiparous women, the OASIS rate was significantly lower in normal vaginal delivery without episiotomy than with (0.3% vs 0.5%; P=0.026), but significantly lower in instrumental deliveries with episiotomy than without (0.5% vs 23.5% P<0.001). Overall, episiotomy was a protective factor for OASIS (odds ratio=0.273, 95% confidence interval= 0.208-0.358; P<0.001).
 
Conclusion: Episiotomy was protective against OASIS among nulliparous women with singleton normal vaginal delivery and instrumental delivery in an Asian population. It also conferred protection among multiparous women undergoing instrumental delivery but not in those having normal vaginal delivery.
 
 
New knowledge added by this study
  • Episiotomy is a protective factor against obstetric anal sphincter injury (OASIS) among nulliparous women undergoing singleton normal vaginal delivery and instrumental delivery in an Asian population.
  • Episiotomy also confers protection against OASIS among multiparous women undergoing instrumental delivery in an Asian population.
  • Conversely, episiotomy may increase the risk of OASIS in multiparous women undergoing normal vaginal delivery.
Implications for clinical practice or policy
  • It is recommended that women should be informed of these findings to support informed decision-making regarding episiotomy.
  • A more restrictive approach should be adopted in multiparous women undergoing normal vaginal delivery.
 
 
Introduction
Obstetric anal sphincter injury (OASIS) is a serious complication of vaginal delivery that can result in faecal incontinence, thereby impairing women’s quality of life. Reported prevalence rates of OASIS range from less than 1% to 11%.1 2 3 In the United Kingdom, the incidence tripled from 1.8% to 5.9% between 2000 and 2012, presumably due to improved detection techniques and increased awareness.4 In Hong Kong, the incidence increased from 0.04% in 2004 to 0.1% in 2009, and to 0.3% in 2014 during normal vaginal deliveries.5 Episiotomy, commonly performed during the second stage of labour to facilitate delivery and prevent excessive stretching of the perineal muscles, may increase intrapartum blood loss and perineal pain.6 The role of episiotomy in mitigating OASIS remains controversial.7 8 Consequently, the liberal use of episiotomy has declined in Hong Kong, with rates falling from 81% in 2004 to 66.2% in 2009 and 47.4% in 2014.5 Ethnic differences in pelvic floor biometry and pelvic organ mobility have been reported,8 9 and studies suggest that Asian women are more prone to OASIS.10 11 This study aimed to review the incidence of OASIS in our unit over the past decade in the context of declining episiotomy rates.
 
Methods
This study was conducted in Prince of Wales Hospital, a tertiary obstetrics and gynaecology unit with an annual delivery volume of approximately 4500 to 6000. All singleton vaginal deliveries—including spontaneous vaginal, ventouse, or forceps deliveries—between 1 January 2012 and 31 December 2021 were included. Breech and preterm deliveries were excluded. Maternal demographics were entered into the electronic record either antenatally by midwives or obstetricians if women had received antenatal care in our unit, or by midwives immediately after delivery. Maternal age and body mass index (BMI) were recorded at delivery. Macrosomia was defined as a birth weight of ≥4000 g. Most spontaneous vaginal deliveries were conducted by trained midwives or student midwives under supervision; instrumental deliveries were performed by trained obstetricians or trainees under senior supervision. When indicated, a left mediolateral episiotomy and a hands-on approach to protect the perineum were used by both midwives and doctors. Per vaginal and per rectal examinations were performed immediately after delivery. If OASIS was suspected, assessment was conducted by an obstetric specialist. The degree of OASIS was classified using the Abdul Sultan classification (Table 1).12 Delivery details were documented by midwives immediately after birth. Operative records for instrumental deliveries and OASIS repair, where applicable, were completed immediately after the procedure. Data were extracted from the hospital’s electronic delivery database between July 2022 and June 2023. Statistical analyses were performed using SPSS (Windows version 29.0; IBM Corp, Armonk [NY], United States). Descriptive analyses were used to examine demographics, mode of delivery, and the prevalences of episiotomy and OASIS. Means were compared between groups using the independent samples t test. Frequencies were compared using the Pearson Chi squared test or Fisher’s exact test, as appropriate. Trends were analysed using the Chi squared test for trend (Cochran–Armitage test). All risk factors were included in multivariable logistic regression analysis except epidural analgesia, nulliparity, and neonatal birth weight (justification provided in Results section). A P value of <0.05 was considered statistically significant.
 

Table 1. Abdul Sultan classification of obstetric anal sphincter injury12
 
Results
A total of 43 732 deliveries were included in this study. The mean ± standard deviation maternal age at delivery was 31.5 ± 4.7 years and the median parity was 0 (interquartile range, 1). Of these, 22 566 (51.6%) were nulliparous and 21 166 (48.4%) were multiparous. Among the latter, 2268 (10.7%) had only previously delivered by Caesarean section and were therefore vaginally nulliparous. Data concerning previous delivery mode were missing for 905 women (4.3%). In total, 39 603 women (90.6%) had a normal vaginal delivery, 3528 (8.1%) had ventouse delivery, and 601 (1.4%) had a forceps delivery. Over the 10-year period from 2012 to 2021, the overall instrumental delivery rate and ventouse delivery rate declined significantly, from 13.2% to 12.0% (P<0.001) and from 11.8% to 8.6%, respectively (P<0.001) [Fig 1]. Overall, 23 325 women (53.3%) underwent episiotomy, whereas 20 407 (46.7%) did not; 326 women (0.7%) sustained OASIS, whereas 43 406 (99.3%) did not. The overall episiotomy rate decreased from 62.8% to 44.7% (P<0.001), with reductions observed in both nulliparous (from 89.2% to 68.5%; P<0.001) and multiparous women (from 31.7% to 23.8%; P<0.001). Conversely, the overall OASIS rate increased from 0.3% to 1.4% (P<0.001), with higher rates in nulliparous (from 0.4% to 2.5%; P<0.001) and multiparous women (0.1%-0.5%; P<0.001) [Fig 2].
 

Figure 1. Ten-year trend in instrumental delivery (n=43 732)
 

Figure 2. Ten-year trends in obstetric anal sphincter injury and episiotomy rates (n=43 732)
 
The characteristics of the study population are summarised in Table 2. Episiotomy rates among women with and without OASIS were 51.8% and 53.3%, respectively (P=0.587). A higher proportion of women in the OASIS group were nulliparous (79.1% vs 51.4%; P<0.001) and vaginally nulliparous (85.9% vs 56.5%; P<0.001). Instrumental delivery was also more common in the OASIS group compared with the non-OASIS group (29.1% vs 9.3%; P<0.001). No statistically significant difference was observed between the type of instrumental vaginal delivery and the occurrence of OASIS (P=0.128). Women with OASIS had a lower BMI, a longer duration of labour, and delivered heavier neonates. No significant differences were observed in mean maternal age, ethnicity, gestational age, onset of labour, epidural analgesia, episiotomy, or macrosomia. All risk factors were included in the multivariable logistic regression analysis except epidural analgesia, nulliparity, and neonatal birth weight. Epidural analgesia was excluded because only one delivery with OASIS involved epidural analgesia, while nulliparity and neonatal birth weight were excluded due to their strong correlation with vaginal nulliparity and macrosomia, respectively. Macrosomia was considered to have greater clinical relevance than neonatal birth weight because a standard cut-off value exists. Multivariable logistic regression analysis revealed that vaginal nulliparity and instrumental delivery remained independent risk factors for OASIS, whereas BMI and labour duration did not. Induced labour (odds ratio [OR]=0.734, 95% confidence interval [CI]=0.577-0.934; P=0.012) and episiotomy (OR=0.273, 95% CI=0.208-0.358; P<0.001) were identified as protective factors, while macrosomia (OR=2.754, 95% CI=1.435-5.284; P<0.001) was identified as a risk factor for OASIS (Table 3). Missing data were noted for BMI in 543 cases (1.2%) and for onset of labour in 82 cases (0.2%).
 

Table 2. Characteristics of the study population and comparison between women with and without obstetric anal sphincter injury (n=43 732)
 

Table 3. Simple and multivariable logistic regression of risk factors for obstetric anal sphincter injury
 
In the subgroup analysis of nulliparous women, the OASIS rate was significantly lower among those undergoing normal vaginal delivery with episiotomy compared to those without (0.6% vs 1.7%; P<0.001) and those undergoing instrumental delivery with episiotomy (1.7% vs 42.9%; P<0.001). Among multiparous women, the OASIS rate was significantly lower in those undergoing normal vaginal delivery without episiotomy (0.3% vs 0.5%; P=0.026) and those undergoing instrumental delivery with episiotomy (0.5% vs 23.5% without episiotomy; P<0.001). Among vaginally nulliparous women within the multiparous group, no statistically significant difference in OASIS rates was observed between normal vaginal deliveries with and without episiotomy; however, the OASIS rate was significantly lower among those undergoing instrumental deliveries with episiotomy compared with those without (0% vs 37.5%; P<0.001) [Table 4].
 

Table 4. Rate of obstetric anal sphincter injury according to parity, episiotomy status, and mode of vaginal delivery
 
Discussion
In recent years, many obstetric units in Hong Kong have promoted a reduction in episiotomy use in recent years. Our unit achieved substantial reductions in episiotomy rates among nulliparous and multiparous women between 2012 and 2021. Although the overall rate of OASIS remained low, considerable increases were observed in both groups during the study period. Vaginal nulliparity and operative vaginal delivery were identified as independent risk factors for OASIS, consistent with previous findings.7 11 Furthermore, episiotomy was identified as a protective factor against OASIS in multivariable logistic regression analysis (OR=0.273, 95% CI=0.208-0.358) [Table 3].
 
In nulliparous women, episiotomy was protective against OASIS in both normal and instrumental vaginal deliveries. These findings differ from those of previous large-scale studies.7 11 In a large retrospective study in the Netherlands involving over 281 000 vaginal deliveries,13 and in another study including more than 10 000 women in Australia,14 mediolateral episiotomy was shown to reduce the risk of OASIS in nulliparous women (OR=0.2113 and 0.54,14 respectively). However, Mahgoub et al11 in France reported no association between episiotomy and OASIS. In their cohort of 42 626 women, the overall OASIS rate was 1.2% and the overall episiotomy rate was only 10%.11 Perrin et al7 reported an episiotomy rate of 63.2% in nulliparous women and an OASIS rate of 0.7%, regardless of episiotomy use. In their analysis, episiotomy was not associated with OASIS in normal vaginal delivery but appeared to be protective in nulliparous women undergoing operative vaginal delivery at term.7
 
The above studies mainly involved women in Western populations. Several studies have indicated that Asian women have a two- to nine-fold increased risk of sustaining OASIS.15 16 17 18 19 In a study conducted in Israel involving over 80 000 women, including 997 of Asian origin, the OASIS rate among Asian women was 9 times higher than that among women of Western descent (3.5% vs 0.4%; P=0.001).16 Asian women also had a higher proportion of fourth-degree tears (17.1% vs 6.6%; P=0.039), despite smaller newborns (mean birth weight: 3318 g vs 3501 g; P=0.004).16 Anatomical differences between ethnic groups may contribute to this disparity. Cheung et al9 reported that pregnant women of East Asian origin had a thicker pubovisceral muscle, a smaller levator hiatus, and reduced pelvic organ mobility compared with pregnant women of Western descent. These factors may contribute to the higher risk of OASIS.9 Moreover, Bates et al20 found that a shorter perineal length measured during the second stage of labour prior to pushing was significantly associated with OASIS. Although a study conducted in Hawaii found no significant difference in perineal body length between Western and Chinese women, measurements were taken during the first stage of labour rather than before pushing.21 Further studies are needed to determine whether perineal body length differs during the second stage of labour. The reasons for the higher OASIS rates among Asian women remain unclear but are likely to be complex and multifactorial.
 
Another notable point is the higher rate of epidural analgesia use among Western women compared with Asian women (50%-90% vs 0%-2.2%), even within the same hospital setting where epidural analgesia is offered free of charge to all women.7 11 16 20 In the present study, the rate of epidural analgesia was low throughout the study period. In this cohort, epidural analgesia was not associated with OASIS. A meta-analysis examining risk factors for OASIS found no association with epidural analgesia; however, it included only two studies.22 In contrast, Mahgoub et al11 identified epidural analgesia as a protective factor for OASIS, whereas another meta-analysis reported it as a risk factor.19 These conflicting findings suggest that the role of epidural analgesia in OASIS remains unclear.
 
There is limited literature on the role of episiotomy in normal vaginal delivery among multiparous women. In the present study, episiotomy did not protect multiparous women from OASIS, except in the context of instrumental vaginal delivery. Indeed, episiotomy may increase the risk of OASIS in this group.23 However, we noted that episiotomy was protective against OASIS among multiparous women undergoing instrumental vaginal delivery (OR=0.028). This finding is supported by a Dutch study which reported five-fold and ten-fold reductions in OASIS during vacuum and forceps deliveries, respectively.24 In light of these findings, we recommend a more restrictive approach to episiotomy among multiparous women undergoing normal vaginal delivery.
 
The rising trend of OASIS over the past decade may also be attributable to improvements in clinical detection following the promotion of more thorough post-delivery assessments by both midwives and obstetricians. Kwok et al25 reported that the prevalence of occult OASIS—detected by endoanal ultrasound but not identified by clinical examination after delivery—was as high as 7.8% after normal vaginal delivery and 3.8% after instrumental delivery. Subsequently, regular OASIS workshops were introduced to train midwives and doctors in performing standardised vaginal and rectal examinations after vaginal delivery. When a major perineal tear is suspected, immediate reassessment by an obstetric specialist is conducted. This practice has been shown to improve the detection rate of OASIS.26 We also analysed trends in instrumental vaginal delivery over the 10-year period. Overall, decreasing trends were observed for both instrumental and ventouse deliveries. The rate of forceps delivery remained similar or showed a slight decrease, except in 2021. Therefore, the rising trend in OASIS is unlikely to be explained by changes in instrumental delivery rates.
 
Strengths and limitations
The strengths of this study include its large sample size, 10-year study period, and the documented reduction in episiotomy rates, which allowed evaluation of the role of episiotomy in OASIS. Our unit is a tertiary centre with the highest delivery volume in Hong Kong, and this represents the largest retrospective study to date focusing on an Asian population. However, as a retrospective study, missing data were noted during data collection and entry. In addition, several risk factors previously identified in meta-analyses—such as the duration of the second stage of labour, fetal head position at delivery, history of previous OASIS, and shoulder dystocia—were not analysed in the present study,19 27 representing a key limitation. Furthermore, some cases of OASIS may have been missed on clinical examination. High-quality research is needed to further investigate OASIS, given its substantial impact on women’s quality of life.
 
Conclusion
With a substantial reduction in episiotomy rates, a corresponding increase in the rate of OASIS was observed. Episiotomy was protective against OASIS among nulliparous women undergoing singleton normal vaginal delivery and instrumental delivery. It also conferred protection in multiparous women undergoing instrumental delivery but not in those having normal vaginal delivery. Among vaginally nulliparous women within the multiparous group, the OASIS rate was significantly higher in those undergoing instrumental deliveries without episiotomy, similar to the rate observed in nulliparous women. Conversely, the OASIS rate was higher in the episiotomy group during normal vaginal delivery, although this difference was not statistically significant and may have been influenced by the small sample size. Further high-quality research is warranted, and women should be informed of these findings to enable informed decision-making regarding episiotomy.
 
Author contributions
Concept or design: SSC Chan, RYK Cheung.
Acquisition of data: SSC Chan, RYK Cheung, TW Chau, YY Lau, SM Ng, TM Tso.
Analysis or interpretation of data: SSC Chan, YY Lau.
Drafting of the manuscript: YY Lau.
Critical revision of the manuscript for important intellectual content: All authors.
 
All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
All authors have disclosed no conflicts of interest.
 
Acknowledgement
The authors thank Ms LL Lee, our research assistant, for her assistance with data acquisition, analysis, and interpretation.
 
Declaration
Findings from this study were partially presented as an e-poster at the Royal College of Obstetricians and Gynaecologists World Congress 2024, Muscat, Oman, 15-17 October 2024.
 
Funding/support
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
 
Ethics approval
This research was obtained from the Joint Chinese University of Hong Kong–New Territories East Cluster Clinical Research Ethics Committee, Hong Kong (Ref No.: 2022.259). The requirement for patient consent was waived by the Committee due to the retrospective nature of the research. The study complied with the Declaration of Helsinki and the International Council for Harmonization Guideline for Good Clinical Practice.
 
References
1. Tung CW, Cheon WC, Tong WM, Leung HY. Incidence and risk factors of obstetric anal sphincter injuries after various modes of vaginal deliveries in Chinese women. Chin Med J (Engl) 2015;128:2420-5. Crossref
2. Jangö H, Langhoff-Roos J, Rosthøj S, Sakse A. Modifiable risk factors of obstetric anal sphincter injury in primiparous women: a population-based cohort study. Am J Obst Gynecol 2014;210:59.e1-6. Crossref
3. Hsieh WC, Liang CC, Wu D, Chang SD, Chueh HY, Chao AS. Prevalence and contributing factors of severe perineal damage following episiotomy-assisted vaginal delivery. Taiwan J Obstet Gynecol 2014;53:481-5. Crossref
4. Gurol-Urganci I, Cromwell DA, Edozien LC, et al. Third- and fourth-degree perineal tears among primiparous women in England between 2000 and 2012: time trends and risk factors. BJOG 2013;120:1516-25. Crossref
5. Hong Kong College of Obstetricians and Gynaecologists. Territory-wide Audit in Obstetrics & Gynaecology. 2014. Available from: https://www.hkcog.org.hk/hkcog/Download/Territory-wide_Audit_in_Obstetrics_Gynaecology_2014.pdf. Accessed 1 May 2020.
6. Woolley RJ. Benefits and risks of episiotomy: a review of the English-language literature since 1980. Part II. Obstet Gynecol Surv 1995;50:821-35. Crossref
7. Perrin A, Korb D, Morgan R, Sibony O. Effectiveness of episiotomy to prevent OASIS in nulliparous women at term. Int J Gynaecol Obstet 2023;162:632-8. Crossref
8. Abdool Z, Dietz HP, Lindeque BG. Ethnic differences in the levator hiatus and pelvic organ descent: a prospective observational study. Ultrasound Obstet Gynecol 2017;50:242-6. Crossref
9. Cheung RY, Shek KL, Chan SS, Chung TK, Dietz HP. Pelvic floor muscle biometry and pelvic organ mobility in East Asian and Caucasian nulliparae. Ultrasound Obstet Gynecol 2015;45:599-604. Crossref
10. Brown J, Kapurubandara S, Gibbs E, King J. The great divide: country of birth as a risk factor for obstetric anal sphincter injuries. Aust N Z J Obstet Gynaecol 2018;58:79-85. Crossref
11. Mahgoub S, Piant H, Gaudineau A, Lefebvre F, Langer B, Koch A. Risk factors for obstetric anal sphincter injuries (OASIS) and the role of episiotomy: a retrospective series of 496 cases. J Gynecol Obstet Hum Reprod 2019;48:657-62. Crossref
12. de Leeuw JW, Struijk PC, Vierhout ME, Wallenburg HC. Risk factors for third degree perineal ruptures during delivery. BJOG 2001;108:383-7. Crossref
13. Okeahialam NA, Taithongchai A, Thakar R, Sultan AH. The incidence of anal incontinence following obstetric anal sphincter injury graded using the Sultan classification: a network meta-analysis. Am J Obstet Gynecol 2023;228:675-88.e13. Crossref
14. Hauck YL, Lewis L, Nathan EA, White C, Doherty DA. Risk factors for severe perineal trauma during vaginal childbirth: a Western Australian retrospective cohort study. Women Birth 2015;28:16-20. Crossref
15. Grobman WA, Bailit JL, Rice MM, et al. Racial and ethnic disparities in maternal morbidity and obstetric care. Obst Gynecol 2015;125:1460-7. Crossref
16. Baruch Y, Gold R, Eisenberg H, et al. High incidence of obstetric anal sphincter injuries among immigrant women of Asian ethnicity. J Clin Med 2023;12:1044. Crossref
17. D’Souza JC, Monga A, Tincello DG. Risk factors for perineal trauma in the primiparous population during non-operative vaginal delivery. Int Urogynecol J 2020;31:621-5. Crossref
18. Yeaton-Massey A, Wong L, Sparks TN, et al. Racial/ethnic variations in perineal length and association with perineal lacerations: a prospective cohort study. J Matern Fetal Neonatal Med 2015;28:320-3. Crossref
19. Hu Y, Lu H, Huang Q, et al. Risk factors for severe perineal lacerations during childbirth: a systematic review and meta-analysis of cohort studies. J Clin Nurs 2023;32:3248-65. Crossref
20. Bates LJ, Melon J, Turner R, Chan SS, Karantanis E. Prospective comparison of obstetric anal sphincter injury incidence between an Asian and Western hospital. Int Urogynecol J 2019;30:429-37. Crossref
21. Tsai PJ, Oyama IA, Hiraoka M, Minaglia S, Thomas J, Kaneshiro B. Perineal body length among different racial groups in the first stage of labor. Female Pelvic Med Reconstr Surg 2012;18:165-7. Crossref
22. Barba M, Bernasconi DP, Manodoro S, Frigerio M. Risk factors for obstetric anal sphincter injury recurrence: a systematic review and meta-analysis. Int J Gynaecol Obstet 2022;158:27-34. Crossref
23. Eggebø TM, Rygh AB, von Brandis P, Skjeldestad FE. Prevention of obstetric anal sphincter injuries with perineal support and lateral episiotomy: a historical cohort study. Acta Obstet Gynecol Scand 2024;103:488-97. Crossref
24. van Bavel J, Hukkelhoven CW, de Vries C, et al. The effectiveness of mediolateral episiotomy in preventing obstetric anal sphincter injuries during operative vaginal delivery: a ten-year analysis of a national registry. Int Urogynecol J 2018;29:407-13. Crossref
25. Kwok SP, Wan OY, Cheung RY, Lee LL, Chung JP, Chan SS. Prevalence of obstetric anal sphincter injury following vaginal delivery in primiparous women: a retrospective analysis. Hong Kong Med J 2019;25:271-8. Crossref
26. Andrews V, Sultan AH, Thakar R, Jones PW. Occult anal sphincter injuries—myth or reality? BJOG 2006;113:195-200. Crossref
27. Pergialiotis V, Bellos I, Fanaki M, Vrachnis N, Doumouchtsis SK. Risk factors for severe perineal trauma during childbirth: an updated meta-analysis. Eur J Obstet Gynecol Reprod Biol 2020;247:94-100. Crossref

Validation of diagnosis codes for pleural diseases and procedure codes for relevant respiratory procedures in a healthcare database in Hong Kong: a single tertiary centre study

Hong Kong Med J 2026;32:Epub 30 Jan 2026
© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE
Validation of diagnosis codes for pleural diseases and procedure codes for relevant respiratory procedures in a healthcare database in Hong Kong: a single tertiary centre study
Ken KP Chan, MB, ChB, FRCP1,2; Timothy CC Ng, BSc1; CY Sze, BSc1; KC Ling, MPH1; Christopher Chan, MB, ChB, MRCP1; Charlotte HY Lau, MB, ChB, MRCP1; Stephanie WT Ho, MB, ChB, MRCP1; Joyce KC Ng, MB, ChB, FHKCP1; Rachel LP Lo, MB, ChB, FHKCP1; WH Yip, MB, ChB, FHKCP1; Jenny CL Ngai, MB, ChB, FRCP1; KW To, MB, ChB, FRCP1; Fanny WS Ko, MD, FRCP1; David SC Hui, MD, FRCP1
1 Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
2 Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
 
Corresponding author: Prof David SC Hui (dschui@cuhk.edu.hk)
 
 Full paper in PDF
 
Abstract
Introduction: There are insufficient population-based epidemiological data on various pleural diseases in Hong Kong. We aimed to validate ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification) codes for pleural diseases and relevant procedures prior to conducting epidemiological analyses using local electronic health records.
 
Methods: Hospitalisation episodes coded as ‘pneumothorax’, ‘pleural effusion’, and trauma-related pleural events, as well as procedures beginning with ICD-9-CM codes 33 and 34 between 2013 and 2022, were retrieved from the Hospital Authority. Paediatric patients and uninterrupted hospitalisation episodes were excluded. The cohort was filtered to include those hospitalised at Prince of Wales Hospital (PWH). Up to 50 hospitalisation episodes were randomly selected for manual validation. Positive predictive values (PPVs) with 95% confidence intervals of individual codes were calculated; successful validation was defined as a PPV ≥0.700. The primary endpoint was the PPV of individual diagnosis and procedure codes.
 
Results: A total of 26 757, 218 018, 1269, 185 154, and 106 450 hospitalisation episodes with non-traumatic pneumothorax, non-traumatic pleural effusion, trauma-related pleural events, procedures with code 33, and procedures with code 34, respectively, were retrieved. Within the PWH cohort, PPVs for these diagnosis and procedure codes were 0.853 (0.787-0.904), 0.928 (0.903-0.948), 0.957 (0.907-0.981), 0.932 (0.913-0.948), and 0.933 (0.916-0.948), respectively. Procedures involving indwelling pleural catheterisation and open drainage of the pleural cavity failed validation due to frequent miscoding.
 
Conclusion: This is the first validation study of clinical codes for pleural diseases and related procedures in Hong Kong. All diagnosis codes and most procedure codes were successfully validated.
 
 
New knowledge added by this study
  • This is the first validation study of clinical codes (International Classification of Diseases, Ninth Revision, Clinical Modification) for pleural diseases and relevant procedures in Hong Kong.
  • All diagnosis codes and most procedure codes were successfully validated.
  • Duplication of codes for similar diagnoses or procedures was identified.
Implications for clinical practice or policy
  • With the emergence of new respiratory procedures, diagnosis and procedure codes should be updated regularly.
  • Removal or consolidation of duplicated subcodes in the Hospital Authority system is necessary to facilitate accurate future research and analysis using clinical codes.
  • Researchers should be reminded to search all relevant diagnosis and procedure codes to minimise missing data when identifying specific diseases or procedures.
 
 
Introduction
Pleural diseases are common respiratory conditions that often require hospital admission and have shown an increasing incidence.1 2 In the United States, approximately 1.5 million patients experience pleural effusion annually, with most cases attributed to congestive heart failure, pneumonia, and cancer.3 4 A recent multicentre, cross-sectional study in China estimated the prevalence of pleural effusion at 4684 per 1 million Chinese adults.5 In that study, the most common causes were parapneumonic effusion and empyema (25.1%), malignant neoplasms (23.7%), and tuberculosis (12.3%).5 The median hospitalisation cost was ¥15 534.5 (interquartile range, 9447.2-29 000.0).5 Additionally, an increasing trend in admissions for spontaneous pneumothorax has been observed in England, highlighting the prevalence of the disease and its associated healthcare burden.2
 
Management of pleural diseases involves various diagnostic and therapeutic procedures that extend beyond the pleural space to include the airway and lung parenchyma. Whether closed or open, these procedures substantially contribute to the overall healthcare burden. However, information about pleural diseases and related respiratory procedures in Hong Kong remains limited, highlighting the need for contemporary, population-based epidemiological data.
 
The Hospital Authority, which provides healthcare services to over 90% of Hong Kong’s population, maintains extensive healthcare databases. These include the Clinical Management System (CMS) and the Clinical Data Analysis and Reporting System (CDARS), which capture a wide range of longitudinal clinical data. Examples include hospital discharge records, diagnosis and procedure codes for each hospitalisation episode, radiological findings, and laboratory parameters, particularly blood and pleural fluid analyses. This comprehensive dataset provides valuable insights into the burden of pleural diseases and accurately represents the local population.
 
Before analysing diseases and procedures using administrative data, it is essential to validate the accuracy of diagnosis and procedure codes within the healthcare database. These codes are typically entered by attending physicians, interventionists, or surgeons performing the procedures, which suggests a high degree of reliability. However, no prior local validation study has been conducted. Therefore, we aimed to assess whether diagnosis codes for pleural diseases and procedure codes for relevant respiratory procedures are accurately recorded for each hospitalisation episode within the Hospital Authority systems.
 
Methods
This retrospective, observational validation study of diagnosis and procedure codes utilised data from a territory-wide healthcare database in Hong Kong. Clinical data were obtained from CDARS, provided by the Hospital Authority. Hospitalisation episodes with the targeted diagnosis and procedure codes between 1 January 2013 and 31 December 2022 were retrieved from the system. Each observation represented a hospitalisation episode rather than a unique patient, and no patient recruitment was involved.
 
Diagnosis and procedure codes were defined using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). The basic format of an ICD-9-CM code consists of three to six digits. The Hospital Authority further extends these codes with additional characters after the decimal point to specify particular diagnoses or procedures within an ICD-9-CM code subgroup (‘subcodes’). These subcodes are displayed in CDARS but are not typically accessible to frontline CMS users. All hospitalisation episodes in acute hospitals with a discharge diagnosis code of pneumothorax (codes starting with 512), pleural effusion (codes starting with 012, 197.2, 220.4, 510, or 511), traumatic pneumothorax or haemothorax (trauma-related pleural events, codes starting with 860), or procedure codes for relevant respiratory procedures (codes starting with 33 or 34) were retrieved, regardless of their position in the coding list. Hospitalisation episodes for patients younger than 18 years or from paediatric departments were excluded from subsequent validation analyses. Uninterrupted hospitalisation episodes following the index episodes, including those in acute or convalescent hospitals with the same diagnosis code of interest, were also excluded, as these may represent duplicate entries for the same clinical event. The remaining hospitalisation episodes after exclusions were grouped as the main cohort.
 
Manual verification of a proportion of the retrieved diagnosis and procedure codes, down to the subcode level, was conducted to ensure data accuracy. The main cohort was first filtered to include only hospitalisation episodes at the authors’ affiliated institution, Prince of Wales Hospital (PWH), forming the PWH cohort. A maximum of 50 hospitalisation episodes for each diagnosis or procedure code were randomly extracted from the PWH cohort to estimate the true positive predictive values (PPVs) within a 13% margin of error at a 95% confidence interval (95% CI). This precision level was chosen pragmatically to balance statistical rigour with the substantial manual effort required for chart review in this validation study. Prince of Wales Hospital is a tertiary care centre with a complex case mix, encompassing a wide range of pleural diseases and advanced respiratory procedures. Within the PWH cohort, the types of pleural disease (pleural effusion, pneumothorax, and trauma-related pleural events) and their underlying aetiologies (eg, non-tuberculous infection, tuberculosis, and malignancy) were determined through retrospective review of clinical notes, discharge summaries, radiological findings, and blood and pleural fluid analysis results using the CMS. Procedure codes were verified by reviewing procedure records within the corresponding hospitalisation episodes. All cases were independently reviewed by two board-certified respiratory physicians. Discrepancies were resolved through joint case review until consensus was reached. Coding accuracy was expressed as PPVs with 95% CIs. The PPV was calculated by dividing the number of true positives (ie, hospitalisation episodes in the PWH cohort where diagnosis and procedure codes were confirmed by manual verification) by the total number of true positives and false positives (ie, episodes where codes were rejected upon manual review). The 95% CI was calculated using the exact binomial method.
 
We hypothesised that the PPVs for the accuracy of diagnosis and procedure codes would be equal to or greater than 0.700, a commonly used threshold for successful validation.6 7 8 The primary endpoint was the determination of PPVs for the listed diagnosis and procedure codes. All statistical analyses were performed using Python (version 3.12.6).
 
Results
A total of 26 757 non-traumatic pneumothorax, 218 018 non-traumatic pleural effusion, and 1269 trauma-related pleural events were retrieved from CDARS between 2013 and 2022. Following the exclusion of paediatric patients and uninterrupted hospitalisation episodes, 20 888 non-traumatic pneumothorax, 199 323 non-traumatic pleural effusion, and 1127 trauma-related pleural events remained in the main cohort. Of these, 2451 (11.7%), 24 938 (12.5%), and 251 (22.3%) diagnosis codes for non-traumatic pneumothorax, non-traumatic pleural effusion, and trauma-related pleural events, respectively, were identified from PWH (Fig). Additionally, 185 154 and 106 450 relevant respiratory procedures with ICD-9-CM codes starting with 33 and 34, respectively, were retrieved. After exclusions, 181 770 and 101 336 procedure codes remained, of which 16 078 (8.8%) and 17 299 (17.1%) procedure codes, respectively, were identified from PWH (Fig). Tables 1, 2, and 3 list the diagnosis codes included in the validation analysis for non-traumatic pneumothorax (Table 1), non-traumatic pleural effusion (Table 2) and trauma-related pleural events (Table 3), while Tables 4 and 5 present the procedure codes starting with ‘33’ and ‘34’, respectively; the breakdown of hospitalisation episodes retrieved using these codes, and the numbers remaining after screening, are also shown.
 

Figure. Number of diagnosis and procedure codes identified, from retrieval in the Clinical Data Analysis and Reporting System to inclusion in the Prince of Wales Hospital cohort
 

Table 1. Diagnosis codes for non-traumatic pneumothorax included in the validation analysis
 

Table 2. Diagnosis codes for non-traumatic pleural effusion included in the validation analysis
 

Table 3. Diagnosis codes for trauma-related pleural events included in the validation analysis
 

Table 4. Procedure codes starting with ‘33’ included in the validation analysis
 

Table 5. Procedure codes starting with ‘34’ included in the validation analysis
 
The overall PPVs (95% CIs) for pneumothorax, pleural effusion, trauma-related pleural events, and all diagnosis codes were 0.853 (0.787-0.904), 0.928 (0.903-0.948), 0.957 (0.907-0.981), and 0.919 (0.898-0.936), respectively. The overall PPVs (95% CIs) for procedure codes starting with 33, starting with 34, and for all procedure codes were 0.932 (0.913-0.948), 0.933 (0.916-0.948), and 0.933 (0.920-0.944), respectively.
 
The PPVs for diagnosis codes related to pneumothorax, pleural effusion, and trauma-related pleural events were all equal to or greater than 0.700, with ranges of 0.700-1.000, 0.833-1.000, and 0.857-1.000, respectively. The lowest PPV (95% CI) was observed for postoperative pneumothorax (procedure code 512.1.2) at 0.700 (0.560-0.812). The highest PPVs were seen for iatrogenic pneumothorax (procedure code 512.1.0) and postoperative haemothorax (procedure code 511.8.7), both at 1.000, with 95% CIs of 0.933-1.000 and 0.762-1.000, respectively. The reasons for false-positive diagnosis codes are summarised in online supplementary Tables 1 to 3, with inappropriate coding of alternative diseases being the most common cause.
 
The PPVs for procedure codes starting with 33 ranged from 0.700 to 1.000. Procedure codes starting with 34 met the PPV benchmark, except for 34.04.3 (indwelling pleural catheterisation) and 34.09.3 (drainage of the pleural cavity, open). The reasons for false-positive procedure codes are listed in online supplementary Tables 4 and 5, with inappropriate coding of alternative but similar procedures being the most common cause. The low PPV for procedure code 34.04.3 (indwelling pleural catheterisation) arose from its misuse to represent non-tunnelled pleural catheter insertion, or to document the presence of an indwelling pleural catheter (IPC) inserted during prior hospitalisations. Procedure code 34.09.3 (drainage of the pleural cavity, open) failed to meet the PPV benchmark because it was misused to represent closed pleural drainage by drain insertion, rather than an open procedure.
 
Discussion
This study is the first to validate diagnosis and procedure codes for pleural diseases using a healthcare database in Hong Kong. All diagnosis codes for pleural diseases and the majority of procedure codes for relevant respiratory procedures met the PPV benchmark of 0.700 or higher. Only procedure codes 34.04.3 (indwelling pleural catheterisation) and 34.09.3 (drainage of the pleural cavity, open) failed to meet the validation criteria.
 
In 2008, the Hong Kong Thoracic Society reported the burden of lung disease in Hong Kong using local data from various governmental sources; however, pleural diseases were not included in the report.9 Over the subsequent decade, the incidence rates of individual pleural diseases were studied in Hong Kong. However, these studies were limited in scope as they focused on single pleural diseases (eg, empyema,10 11 12 malignant mesothelioma,13 and spontaneous pneumothorax14) or were restricted to single-centre settings.10 11
 
There is a pressing need for contemporary, population-based epidemiological data covering various pleural diseases in Hong Kong. A recent local survey highlighted heterogeneous practices in the management of pleural diseases among medical clinicians and reflected a lack of awareness and dedicated service infrastructure for pleural diseases.15 Given the rapid advancements in diagnostic strategies and therapeutic options for pleural diseases,16 an accurate and up-to-date assessment of their clinical burden is crucial. Such data provide a foundation for guiding future research, benchmarking healthcare standards in Hong Kong against those of other countries, informing the allocation of future healthcare resources for pleural diseases, and estimating the workload of healthcare professionals managing these conditions. All such service developments should be based on an accurate estimation of the current burden and projected future demand. The use of existing healthcare databases offers a practical approach; however, relevant diagnosis and procedure codes must first be validated. A similar research pathway was followed by Arnold et al,17 who validated diagnosis codes prior to assessing the epidemiology of pleural empyema in English hospitals.17 18
 
Nearly all PPVs of the diagnosis and procedure codes studied exceeded the benchmark of 0.700. Notably, PPVs for procedure codes were generally higher than those for diagnosis codes. This is because diagnosis codes can be carried over from previous hospitalisation episodes, enabling attending physicians to select active or inactive diagnosis codes regardless of their relevance to the current episode. In contrast, procedure codes cannot be carried over and must be entered manually to reflect procedures performed during the corresponding hospitalisation episode. This requirement contributes to the higher accuracy for procedure codes.
 
The PPV for procedure code 34.04.3 (indwelling pleural catheterisation) was unexpectedly low due to misuse. The absence of a specific diagnosis code indicating the presence of an IPC, combined with the inclusion of the term ‘pleural’ in the code description, contributed to its incorrect use, particularly during searches for non-tunnelled pleural catheter insertion. Updated diagnosis codes to indicate the status ‘presence of IPC’, or a new procedure code for ‘pleural fluid drainage using an existing IPC’, would accurately reflect the clinical scenario. Once available, such codes should be validated before any analyses of IPC use in territory-wide healthcare databases. Alternatively, establishing a clinical registry for IPC use could facilitate more accurate tracking of patients with both malignant and benign causes of pleural effusion.
 
Some diagnosis codes (eg, hydrothorax related to dialysis [511.8.3] and hydrothorax as complication of peritoneal dialysis [551.8.8]) and procedure codes (eg, video-assisted thoracoscopy for haemostasis [34.09.4] and injection into thoracic cavity [34.92.0]) were used in other hospitals but not at PWH; therefore, they could not be validated in this study. Within the PWH cohort, alternative diagnosis or procedure codes were used and validated. However, the number of hospitalisation episodes associated with these codes was small, and their impact would be minimal in a territory-wide healthcare data analysis where similar codes are grouped together.
 
Duplication of subcodes for similar diagnoses or procedures was also noted. Several diagnoses and procedures were represented by different codes, including:
  • Hydrothorax related to dialysis (511.8.3) and hydrothorax as complication of peritoneal dialysis (511.8.8);
  • Fibreoptic bronchoscopy (33.22.0) and bronchoscopy (33.23.0);
  • Endoscopic ultrasonography of bronchus (33.23.3) and endobronchial ultrasonography (33.23.5);
  • Closed endoscopic biopsy of bronchus (33.24.0), bronchoscopic biopsy (33.24.1), fibreoptic bronchoscopy with biopsy (33.24.2), and flexible bronchoscopy with biopsy of bronchus (33.24.7);
  • Lung biopsy via endoscopy (33.27.0), bronchoscopic biopsy under fluoroscopic guidance (33.27.1), and flexible bronchoscopy with biopsy of lung (33.27.2);
  • Video-assisted thoracoscopy for haemostasis (34.09.4) and video-assisted thoracoscopy, haemostasis (34.21.5); and
  • Chemical pleurodesis (34.92.1) and pleurodesis, chemical (34.92.2).
  •  
    Researchers should be reminded to search all relevant diagnosis and procedure codes to minimise the risk of missing data for specific diseases or procedures during code searches. In the long term, reconciling similar codes may help reduce ambiguity and improve data consistency.
     
    Strengths and limitations
    This study has several strengths, notably its status as the first validation study conducted using a large healthcare database in Hong Kong. It successfully validated codes for a wide range of pleural diseases and respiratory procedures, thereby laying the foundation for future epidemiological research. However, several limitations should be acknowledged. Not all codes could be adequately validated due to their small case volumes in the PWH cohort. For example, codes for Meigs’ syndrome (220.4), traumatic pneumothorax with open wound into thorax (860.1), and traumatic haemothorax with open wound into thorax (860.3) had small numbers even in the overall cohort, and some codes were duplicated. As such, future research incorporating patient searches based on these diagnosis and procedure codes should take these limitations into account. The single-centre nature of the study represents a further limitation, as disease patterns and coding practices may vary across district general hospitals.
     
    Conclusion
    This is the first validation study of diagnosis codes for pleural diseases and procedure codes for relevant respiratory procedures using a territory-wide healthcare database in Hong Kong. All diagnosis codes and the majority of procedure codes demonstrated high PPVs, indicating accurate coding. Given the emergence of new respiratory procedures, diagnosis and procedure codes should be regularly updated. The removal or consolidation of duplicated subcodes within the Hospital Authority system is also necessary to facilitate accurate future research and analysis using clinical codes. Further evaluation and harmonisation of coding practices across different hospitals would be beneficial. These measures will pave the way for future territory-wide studies and enable monitoring of the overall burden of pleural diseases in Hong Kong.
     
    Author contributions
    Concept or design: KKP Chan.
    Acquisition of data: KKP Chan, TCC Ng, CY Sze, KC Ling.
    Analysis or interpretation of data: KKP Chan, TCC Ng, CY Sze, KC Ling.
    Drafting of the manuscript: KKP Chan.
    Critical revision of the manuscript for important intellectual content: KKP Chan, TCC Ng, C Chan, CHY Lau, SWT Ho, JKC Ng, RLP Lo, WH Yip, JCL Ngai, KW To, FWS Ko, DSC Hui.
     
    All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
     
    Conflicts of interest
    All authors have disclosed no conflicts of interest.
     
    Acknowledgement
    The authors thank Prof Terry CF Yip from the Department of Medicine and Therapeutics of The Chinese University of Hong Kong for providing statistical support.
     
    Funding/support
    This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
     
    Ethics approval
    This research was approved by the Joint Chinese University of Hong Kong–New Territories East Cluster Clinical Research Ethics Committee, Hong Kong (Ref No.: 2022.031). The requirement for patient consent was waived by the Committee due to the retrospective nature of the study.
     
    Supplementary material
    The supplementary material was provided by the authors and some information may not have been peer reviewed. Accepted supplementary material will be published as submitted by the authors, without any editing or formatting. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by the Hong Kong Academy of Medicine and the Hong Kong Medical Association. The Hong Kong Academy of Medicine and the Hong Kong Medical Association disclaim all liability and responsibility arising from any reliance placed on the content.
     
    References
    1. Bodtger U, Hallifax RJ. Epidemiology: why is pleural disease becoming more common? In: Maskell NA, Laursen CB, Lee YCG, et al, editors. Pleural Disease. Vol 87. Schweiz, Switzerland: European Respiratory Society; 2020: 1-12. Crossref
    2. Hallifax RJ, Goldacre R, Landray MJ, Rahman NM, Goldacre MJ. Trends in the incidence and recurrence of inpatient-treated spontaneous pneumothorax, 1968-2016. JAMA 2018;320:1471-80. Crossref
    3. Light RW. Pleural effusions. Med Clin North Am 2011;95:1055-70. Crossref
    4. Taghizadeh N, Fortin M, Tremblay A. US hospitalizations for malignant pleural effusions: data from the 2012 National Inpatient Sample. Chest 2017;151:845-54. Crossref
    5. Tian P, Qiu R, Wang M, et al. Prevalence, causes, and health care burden of pleural effusions among hospitalized adults in China. JAMA Netw Open 2021;4:e2120306. Crossref
    6. Kwok WC, Tam TC, Sing CW, Chan EW, Cheung CL. Validation of diagnostic coding for bronchiectasis in an electronic health record system in Hong Kong. Pharmacoepidemiol Drug Saf 2023;32:1077-82. Crossref
    7. Ye Y, Hubbard R, Li GH, et al. Validation of diagnostic coding for interstitial lung diseases in an electronic health record system in Hong Kong. Pharmacoepidemiol Drug Saf 2022;31:519-23. Crossref
    8. Kwok WC, Tam TC, Sing CW, Chan EW, Cheung CL. Validation of diagnostic coding for asthma in an electronic health record system in Hong Kong. J Asthma Allergy 2023;16:315-21. Crossref
    9. Chan-Yeung M, Lai CK, Chan KS, et al. The burden of lung disease in Hong Kong: a report from the Hong Kong Thoracic Society. Respirology 2008;13 Suppl 4:S133-65. Crossref
    10. Chan KP, Ng SS, Ling KC, et al. Phenotyping empyema by pleural fluid culture results and macroscopic appearance: an 8-year retrospective study. ERJ Open Res 2023;9:00534-2022. Crossref
    11. Tsang KY, Leung WS, Chan VL, Lin AW, Chu CM. Complicated parapneumonic effusion and empyema thoracis: microbiology and predictors of adverse outcomes. Hong Kong Med J 2007;13:178-86.
    12. Chan KP, Ma TF, Sridhar S, Lam DC, Ip MS, Ho PL. Changes in etiology and clinical outcomes of pleural empyema during the COVID-19 pandemic. Microorganisms 2023;11:303. Crossref
    13. Chang KC, Leung CC, Tam CM, Yu WC, Hui DS, Lam WK. Malignant mesothelioma in Hong Kong. Respir Med 2006;100:75-82. Crossref
    14. Chan JW, Ko FW, Ng CK, et al. Management and prevention of spontaneous pneumothorax using pleurodesis in Hong Kong. Int J Tuberc Lung Dis 2011;15:385-90.
    15. Lui MM, Yeung YC, Ngai JC, et al. Implementation of evidence on management of pleural diseases: insights from a territory-wide survey of clinicians in Hong Kong. BMC Pulm Med 2022;22:386. Crossref
    16. Lui MM, Lee YC. Twenty-five years of respirology: advances in pleural disease. Respirology 2020;25:38-40. Crossref
    17. Arnold DT, Hamilton FW, Morris TT, et al. Epidemiology of pleural empyema in English hospitals and the impact of influenza. Eur Respir J 2021;57:2003546. Crossref
    18. Hamilton F, Arnold D. Accuracy of clinical coding of pleural empyema: a validation study. J Eval Clin Pract 2020;26:79-80. Crossref

    Profiling unmet post–acute care needs of an inpatient population in Hong Kong: can realworld data and machine learning algorithms bring precision to tertiary prevention in the community?

    © Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
     
    ORIGINAL ARTICLE
    Profiling unmet post–acute care needs of an inpatient population in Hong Kong: can real-world data and machine learning algorithms bring precision to tertiary prevention in the community?
    Eman Leung, PhD1,2; Jingjing Guan, PhD3; Frank Youhua Chen, PhD1; Sam CC Ching, BBA2; Hector Tsang, PhD4; Martin CS Wong, MD, FHKAM (Family Medicine)2; Olivia Lam, MPH2; Yinan He, MPH2; Sarah TY Yau, MPH2; Yilin Liu, MPH2; CB Law, MB, BS5; NY Chan, MB, BS5; YF Wong, PhD5; YH Chow, BSocSc6; CT Hung, FHKAM (Anaesthesiology)2; EK Yeoh, FHKAM (Medicine)2; Albert Lee, MD, FHKAM (Family Medicine)2,4,7
    1 Department of Management Sciences, City University of Hong Kong, Hong Kong SAR, China
    2 The Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong SAR, China
    3 EpitelligenceHK, Hong Kong SAR, China
    4 Department of Rehabilitation Science, Hong Kong Polytechnic University, Hong Kong SAR, China
    5 Kowloon West Cluster, Princess Margaret Hospital and North Lantau Hospital, Hong Kong SAR, China
    6 Kwai Tsing Safe Community and Healthy City Association, Hong Kong SAR, China
    7 Centre for Health Education and Health Promotion, The Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong SAR, China
     
    Corresponding author: Prof Albert Lee (alee@cuhk.edu.hk)
     
     Full paper in PDF
     
    Abstract
    Introduction: Case-mix systems aim to optimise acute care resource allocation, yet patients within the same groups often exhibit substantial variability in utilisation. This study aimed to examine how incorporating measures of clinical complexity and post–acute care utilisation—both critical to rehospitalisation risk and accurate resource planning—into case-mix stratification could improve the precision of acute care resource allocation.
     
    Methods: Through iterative applications of unsupervised and supervised machine learning models, we extracted typical patient profiles from the study populations, analysed post–acute care utilisation patterns, and assessed the 28-day rehospitalisation rates resulting from different pairings between clinical profiles and post–acute care service utilisation patterns.
     
    Results: Across various disease systems and age-groups, patients discharged without receiving algorithm-selected post–acute care (ie, No Service groups [NS groups]) showed significantly higher 28-day rehospitalisation rates relative to their corresponding segments in the same medoid case-mix groups (CMGs; pooled odds ratio [OR]=19.27; P<0.001). The NS groups also demonstrated higher rates of having two or more chronic diseases (pooled OR=1.84; P<0.001) and—for the 50-64–year-old population—resource-intensifying co-morbidities (pooled OR=1.23; P=0.05). Patients displaying higher rates of resource-intensifying co-morbidities compared with their ≥65-year-old counterparts (such as when the medoid CMG was renal failure or chronic obstructive pulmonary disease) also exhibited significantly higher 28-day rehospitalisation rates than the ≥65–year-old NS groups sharing the same medoid CMGs.
     
    Conclusion: These findings support a precision-driven approach to designing rehospitalisation prevention programmes that target individuals aged 50 to 64 years discharged with specific clinical profiles, and developing and allocating human capital for these targeted prevention programmes.
     
     
    New knowledge added by this study
    • Our novel machine learning analyses revealed that ambulatory care–sensitive conditions such as chronic obstructive pulmonary disease and general digestive symptoms were the diagnoses received by patients who were ‘typical’ (ie, the medoid) of the studied inpatient population and its subpopulations of patients with unmet post–acute care needs.
    • Higher proportions of patients aged 50 to 64 years in the subpopulations had histories of two or more chronic illnesses prior to the index hospitalisation, had resource-intensifying co-morbidities at the index hospitalisation, and rehospitalised within 28 days after being discharged.
    Implications for clinical practice or policy
    • Tertiary prevention programmes targeting specific profiles of individuals aged 50 to 64 years who are discharged into the community can help relieve the burden on hospital services.
    • The integration of post–acute care utilisation data and clinical complexity indicators into population stratification can improve the precision of tertiary prevention planning and resource allocation across community and hospital settings.
     
     
    Introduction
    To standardise clinical practices and inform targeted policy decisions, major health systems segment their populations into case-mix groups (CMGs). With expert input and analytical methods, CMGs are designed with optimal granularity—balancing individual-level clinical care decisions and population-level acute care resource allocation1—and judicious parsimony, selecting indicators from the wealth of information extracted from patient electronic health records (see online supplementary Table 1 for a comparison of major healthcare systems’ case-mix frameworks).
     
    However, clinical case-mix systems often provide imperfect estimations of their populations’ acute care utilisation.2 3 4 It has been suggested that critical drivers of acute care admissions and 28-day rehospitalisations, such as clinical complexity,5 6 are not often included as indicators for stratifying patients. Also, the linkage between case mixes of populations and their respective post–acute care (PAC) needs has not been established, although PAC can reduce rehospitalisations and mitigate the rehospitalisation risk associated with clinical complexity.7 In fact, not only have the PAC needs of patients discharged under various case-mix classifications remained unexplored, but studies examining the effects of PAC on acute care utilisation often fail to consider the diversity of PAC service types8 9 and their differential effects on patients with distinct clinical profiles.10 11 12
     
    Therefore, this study aimed to identify the factors contributing to the discrepancy between the objectives of case-mix systems—optimising the efficient allocation of acute care resources—and the observed heterogeneity in acute care utilisation among patients within the same CMGs. Specifically, although clinical complexity and PAC utilisation influence the rehospitalisation risk of discharged patients—which in turn affects the accuracy of population-level acute care resource planning—they are not typically included in case-mix systems for patient stratification. Thus, we examined the heterogeneity and relationships among clinical complexity, PAC utilisation, and rehospitalisation risk within homogeneous patient segments. These segments were partitioned from the study population using conventional case-mix parameters and acute care utilisation metrics. Given this context, we hypothesised that among patients within the same homogeneous segments, those who did not receive effective PAC would exhibit the highest rates of 28-day rehospitalisation. Additionally, we hypothesised that greater clinical complexity would increase the likelihood of rehospitalisation occurring before receipt of any effective PAC.
     
    Methods
    Study population
    In this study of an inpatient population of 197 805 individuals (aged >50 years) discharged into the community, a combination of unsupervised and supervised learning algorithms was deployed (Fig 1). First, unsupervised learning algorithms were applied to identify typical patients (ie, medoids) using a comprehensive set of clinical parameters (including discharged patients’ CMGs) and acute care utilisation data.13 Patients similar to typical patients in terms of these parameters were clustered into the same segments. Each resulting segment was labelled according to the Major Clinical Category (MCC)13 assigned to its medoid. According to case-mix methodologies adopted by major healthcare systems (eg, CMG+ of Canada13), the MCC reflects the primary body system or medical specialty involved and provides a high-level overview of the patient’s condition. Within each MCC, patients are further classified into more specific CMGs based on detailed clinical and resource utilisation characteristics. We therefore expected that patients within the same segment would share the same MCC as the medoid, although their CMGs might differ. Consequently, each segment was labelled with the medoid’s MCC. The International Classification of Diseases codes constituting each CMG and the corresponding MCC for each are shown in online supplementary Table 2.
     

    Figure 1. Methodology
     
    Study design
    Second, with additional features representing the types and timing of PAC service utilisation, 28-day rehospitalisation outcome–supervised machine learning algorithms (Unbiased Recursive Partitioning with Surrogate Splitting [URPSS]14) were applied to recursively partition clinically homogeneous segments into subpopulations, each characterised by homogeneous PAC utilisation. The URPSS has previously been used to compare the effects of clinical profiles and acute care utilisation on 28-day rehospitalisations with those of different PAC service types, isolating the unique contribution of patients’ clinical and acute care factors.15 In this study, we adopted a complementary approach by isolating each PAC service type’s unique contribution to 28-day rehospitalisation while adjusting for the influence of the end user’s clinical profile and acute care utilisation. To achieve this approach, we first partitioned the population into segments with homogeneous clinical and acute care utilisation profiles. Within each segment, the URPSS algorithm was then applied to infer the effects of PAC on 28-day rehospitalisation, contingent on patients’ clinical and acute care characteristics. A detailed description of the hybrid machine learning approach used to disentangle post-acute from acute influences is provided in the online Appendix.14 15 16
     
    Among the different subpopulations partitioned from each segment, one inevitably remained unpartitioned by any feature representing the PAC services for which the algorithm found significant conditional inferences on 28-day rehospitalisation. We hypothesised that this unpartitioned subpopulation—representing patients whose acute care needs (as reflected by the comprehensive segmenting features of clinical and acute care utilisation parameters) were homogeneous with others in the same segment but who lacked any 28-day rehospitalisation–mitigating PAC services—would exhibit the highest clinical complexity and 28-day rehospitalisation rates. These groups of discharged patients, whose rehospitalisation risk was high but who lacked algorithm-selected PAC services, are hereafter referred to as the No Service groups (NS groups).
     
    In conjunction with the 28-day rehospitalisation rate, the prevalence of clinical complexity—reflected by the presence of two or more chronic illnesses diagnosed prior to the index hospitalisation and by acute care resource-intensifying co-morbid diagnoses at index hospitalisation5 6—was also compared between the NS groups and their corresponding segments. We hypothesised that greater clinical complexity would be associated with an increased likelihood of patients being rehospitalised before receiving any effective PAC. Comparisons were also made between populations aged 50-64 years and 65 years or above. Research has shown that adults aged 50 to 64 years face unique health challenges and experience care gaps not observed among those aged 65 years or above.16 In particular, care gaps predominantly affecting the 50-64 age-group have been linked to inaccuracies in predicting patients’ acute care needs using case-mix models,17 which were primarily developed from inpatient populations aged 65 years and older.18 19 20 21
     
    Although many comparisons could be made between the NS groups and their corresponding segments across all segments partitioned from the 50-64–year-old or ≥65–year-old populations—and between the NS groups or segments of the two populations—comparisons were restricted to the NS groups and their corresponding segments that shared the same medoid CMGs, to ensure homogeneity in clinical and acute care utilisation profiles between the subgroups being compared. Similarly, comparisons between the 50-64–year-old and ≥65–year-old NS groups or between the 50-64–year-old and ≥65–year-old segments were confined to pairs with the same medoid CMGs. The odds ratios (ORs), 95% confidence intervals (95% CIs), and P values resulting from comparisons between each same-CMG pair for clinical complexity and 28-day rehospitalisation were calculated from a subset of the descriptive statistics reported in online supplementary Tables 3 (for the 50-64–year-old age-group) and 4 (for the ≥65–year-old age-group). In addition to the presence of data regarding the prevalence of clinical complexity and 28-day rehospitalisations, these supplementary tables include the comprehensive set of features that: (1) constitute the CMGs adopted in this study, (2) segment the 50-64–year-old and ≥65–year-old populations, and (3) partition each segment to identify its corresponding NS groups. These features encompass diagnoses, age, sex, resource-intensive interventions received at index acute care hospitalisation, and resource-intensifying co-morbidities diagnosed at index acute care hospitalisation. Given that the contributions of these features to clinical profile variability had already been adjusted for through multiple iterations, they were unlikely to be selected by the URPSS algorithm to split a segment into subpopulations. Our focus therefore remained on demonstrating the high prevalence of clinical complexity and 28-day rehospitalisation among the NS groups, rather than on features not selected by the URPSS.
     
    We tested our hypotheses regarding the elevated risks of the NS groups compared with their parent segments (particularly for the 50-64–year-old population) through selected paired comparisons and omnibus testing. By aggregating results across different same-CMG pairs, we followed the standard epidemiological practice of utilising all available evidence from various subgroups within a single sample to maximise the robustness and generalisability of estimates while adjusting for inherent sample stratification.22 23 Indeed, whereas analysis of an entire sample may overlook underlying confounding factors, a strong focus on stratified subgroup analyses can lead to misinterpretations that inflate the effects of confounding variables on outcomes and distort the relationships between risk factors and outcomes.24 25 To quantify the likelihood of clinical complexity and 28-day rehospitalisation rates in the NS groups versus their parent segments, we pooled ORs using the Mantel-Haenszel formula26 across same-CMG pairs within each age population and between the 50-64–year-old and ≥65–year-old populations (calculated from the ORs and associated 95% CIs and P values reported in Table 1). This approach allowed us to evaluate overall differences in co-morbidity, chronic illnesses, and 28-day rehospitalisations between age-groups and between the NS groups and their corresponding segments. The Mantel-Haenszel formula has been applied in diverse clinical contexts involving a single patient sample or population, including a targeted patient group with traumatic brain injury,27 a regional population admitted from multiple hospitals with different major diagnoses,28 and a case-control study combining matched and unmatched control groups.29 Results reported below include pooled ORs, 95% CIs, P values, and, where applicable, Q statistics with corresponding P values to indicate significant heterogeneity among pooled ORs.
     

    Table 1. Likelihood of study parameters in the No Service groups and 50-64–year-old population
     
    Results
    Below, we describe the clinical profiles of typical patients (medoids) in the 50-64–year-old and ≥65–year-old populations and their corresponding population segments. We then report the order in which the URPSS algorithm selected PAC services based on their unique statistical importance in classifying 28-day rehospitalisation. We also characterise the clinical profiles of patients who received none of the URPSS-selected PAC services (ie, the NS groups). Finally, we compare the rates of resource-intensifying co-morbidities, the presence of two or more chronic diseases, and 28-day rehospitalisations between the NS groups and their corresponding segments, as well as between the 50-64–year-old and ≥65–year-old populations.
     
    Profiles of typical patients and associated segments in the 50-64–year-old and ≥65–year-old populations
    The Calinski–Harabasz index indicated that the optimal number of segments was seven for the 50-64–year-old population and eight for the ≥65–year-old population.30 Our analyses revealed that the seven typical patients identified in the 50-64–year-old population belonged to the same MCCs as their counterparts in the ≥65–year-old population: Circulatory, Digestive, Nephrology and urology, Musculoskeletal, Respiratory, Multiple systems of diseases and disorders, and Other reasons for hospitalisation. Additionally, four MCCs shared between the two age-groups were characterised by identical CMGs: Symptom or sign of digestive system (Digestive), Malignant neoplasm of urinary system (Nephrology and urology), Chronic obstructive pulmonary disease (Respiratory), and General symptom or sign (Other reasons for hospitalisation). In the ≥65–year-old population, we identified an eighth segment, whose typical patient’s CMG was dementia, belonging to the MCC of Diseases and disorders of the mental system.
     
    Utilisation of post–acute care services and associated 28-day rehospitalisation rates
    Tables 2 and 3 report the type, sequence (reflecting the descending rank order of marginal contribution feature importance), and associated 28-day rehospitalisation rates of each PAC service selected by the URPSS algorithm. With areas under the receiver operating characteristic curve ranging from 0.85 to 0.93, the URPSS algorithms classified 28-day rehospitalisation outcomes in every segment partitioned from the two populations using features selected for their unique contributions to outcomes. Among all features in the pool to which the URPSSs were applied (online supplementary Table 5), only PAC-related features were selected to split segments that had previously been partitioned from the population using other features (eg, sex) that were unrelated to PAC.
     

    Table 2. Sequence of services selected by Unbiased Recursive Partitioning with Surrogate Splitting and associated 28-day rehospitalisation rates in each 50-64–year-old segment
     

    Table 3. Sequence of services selected by Unbiased Recursive Partitioning with Surrogate Splitting and associated 28-day rehospitalisation rates in each ≥65–year-old segment
     
    Our analyses revealed that, compared with all other PAC services, specialist outpatient clinics (SOPCs) had the greatest marginal contribution to 28-day rehospitalisation outcomes among patients with similar clinical profiles and acute care utilisation patterns, even after adjusting for the effects of the segments’ patient clinical profiles and acute care utilisation patterns on 28-day rehospitalisations through conditional inference. Additionally, SOPCs’ contribution to 28-day rehospitalisation was not conditional on the effects of other features. Consequently, the lowest 28-day rehospitalisation rates were observed among SOPC attendees across all homogeneous population segments. Nevertheless, although SOPCs had the highest marginal contribution feature importance—and were associated with the lowest 28-day rehospitalisation rates—in all segments across both populations, the 28-day rehospitalisation rates among SOPC attendees were higher in every segment of the 50-64–year-old population compared with the corresponding segments of the ≥65–year-old population (mean difference between segments with the same MCC profiles: 9.5%).
     
    As shown in Tables 2 and 3, the 28-day rehospitalisation rates were consistently the highest among subpopulations within each segment that remained unsplit after the sequential selection and partitioning by features representing PAC services that the URPSS identified as highly important to 28-day rehospitalisation outcomes (ie, the NS groups). For example, among the 50-64–year-old population, the mean difference in 28-day rehospitalisation rates between the NS groups and those in the same segments who received SOPC care (the PAC service with the greatest feature importance) was 70.01%; the mean difference between the NS groups and their corresponding full segments was 66.69%. Similarly, among the ≥65–year-old population, the mean difference between the NS groups and patients in the same segments who received SOPC care was 76.28%; the mean difference between the NS groups and their corresponding full segments was 62.26%. Notably, whereas the NS groups consistently showed the highest 28-day rehospitalisation rates among all subpopulations, the NS groups of the 50-64–year-old population exhibited a greater mean difference in 28-day rehospitalisation rates compared with their ≥65–year-old counterparts (by a mean difference of 2.99%).
     
    Clinical complexity and 28-day rehospitalisation of the No Service groups and their corresponding segments in the populations aged 50-64 years and ≥65 years
    The above analyses identified a subpopulation (ie, the NS groups) within each segment that exhibited high 28-day rehospitalisation rates but lacked effective PAC services. To provide a more in-depth understanding of the NS groups, we compared 28-day rehospitalisation rates, the prevalence of resource-intensifying co-morbidities, and the presence of two or more chronic illnesses between the NS groups and their corresponding segments, as well as between the 50-64–year-old and ≥65–year-old populations. Not all NS groups’ typical patients shared the same CMGs as the medoids of their corresponding segments, nor were the same CMGs shared between the medoids of the 50-64–year-old and ≥65–year-old populations. Chronic obstructive pulmonary disease (COPD) was the only CMG consistently identified as a medoid CMG in both populations and their corresponding subpopulations. Therefore, a more detailed analysis was conducted on the segment and subpopulation with COPD CMGs to illustrate factors contributing to the differences between NS groups and their corresponding segments, and between the 50-64–year-old and ≥65–year-old populations.
     
    Table 1 reports the ORs (and their associated 95% CIs and P values) for resource-intensifying co-morbidities, the presence of two or more chronic illnesses, and 28-day rehospitalisations of NS groups relative to their corresponding 50-64–year-old or ≥65–year-old population segments sharing the same medoid CMGs. As shown in the table, even when diseases of different systems were considered across both populations, the NS groups exhibited significantly higher rates of 28-day rehospitalisation compared with their same-medoid-CMG segments (pooled OR=19.27, 95% CI=17.86-20.79; P<0.001); they also showed a greater prevalence of having two or more chronic illnesses (pooled OR=1.84, 95% CI=1.64-2.07; P<0.001).
     
    Although resource-intensifying co-morbidity is also a measure of clinical complexity, it was not more likely to be found among NS groups than among their same–medoid-CMG segments. Follow-up analyses revealed that the pooled OR for the ≥65–year-old population was heterogeneous (Q statistic=39.97, P<0.001), whereas the Q statistic for pooled ORs in the 50-64–year-old population was not statistically significant. Upon closer examination, the rate of resource-intensifying co-morbidity was indeed higher in NS groups of the 50-64–year-old population than in their same–medoid-CMG segments (pooled OR=1.23, 95% CI=1.00-1.52; P=0.05); it was lower in the NS group population aged ≥65 years than in their corresponding segments (pooled OR=0.76, 95% CI=0.68-0.85; P<0.001).
     
    The observation that the 50-64–year-old population exhibits higher clinical complexity and 28-day rehospitalisation rates compared with their ≥65–year-old counterparts was directly examined among same–medoid-CMG pairs of the 50-64–year-old and ≥65–year-old population segments, as well as among pairs of NS group populations aged 50-64 years and ≥65 years (Table 1). Whereas the 50-64–year-old population showed higher rates of resource-intensifying co-morbidity and 28-day rehospitalisation compared with the ≥65–year-old population at both the segment and NS-group levels, these differences were not statistically significant (pooled ORs=1.27, 95% CI=0.55-2.93, and 1.18, 95% CI=0.84-1.65, respectively). Follow-up analysis revealed substantial heterogeneity in the pooled statistics, attributable to significant variation among the pooled ORs of NS-group pairs (Q statistics=7.81-9.43; all P<0.05). Follow-up segment-level analyses also showed significantly lower prevalence of all study parameters in the 50-64–year-old population compared with the ≥65–year-old population: OR=0.56 (95% CI=0.52-0.59; P<0.001), OR=0.22 (95% CI=0.20-0.24; P<0.001), and OR=0.93 (95% CI=0.89-0.96; P<0.001) for rates of resource-intensifying co-morbidity, the presence of two or more chronic illnesses, and 28-day rehospitalisation, respectively.
     
    Given the high heterogeneity of pooled ORs for the NS-group CMG pairs, differences in the prevalence of study parameters between the 50-64–year-old and ≥65–year-old populations were examined within individual NS-group pairs. Follow-up analyses revealed that, although not all NS-group CMG pairs showed higher rates of resource-intensifying co-morbidity or 28-day rehospitalisation in the 50-64–year-old population, those that did—such as when the medoid CMG was renal failure or COPD—also showed significantly higher 28-day rehospitalisation rates compared with their ≥65–year-old counterparts sharing the same medoid CMGs. For example, in the case of renal failure, the ORs were 63.11 (95% CI=50.26-79.38; P<0.001) and 1.35 (95% CI=1.06-1.70; P=0.01) for resource-intensifying co-morbidity and 28-day rehospitalisation rates, respectively (Table 1).
     
    Finally, to consider differences in study parameter prevalence between the NS group and its corresponding segment when comparing clinical complexity and 28-day rehospitalisation outcomes between the 50-64–year-old and ≥65–year-old populations, we examined cases in which the CMG was COPD. Chronic obstructive pulmonary disease was the only CMG that served as the medoid of both the population segment and the corresponding NS group for the 50-64–year-old and ≥65–year-old populations, allowing us to adjust for differences in study parameter prevalence between the NS group and its full segment when comparing the two age-groups. Our analyses revealed that, relative to the statistics of the full segments, the ORs for resource-intensifying co-morbidity, two or more chronic illnesses, and 28-day rehospitalisation rates were significantly greater in the 50-64–year-old NS group than in the ≥65–year-old counterparts [ratios of ORs=1.50 (95% CI=1.06-2.11; P=0.02), 1.17 (95% CI=1.01-1.37; P=0.04), and 2.34 (95% CI=1.84-2.96; P<0.001), respectively].
     
    Discussion
    Unmet post–acute care needs and age-related disparities
    Patients aged 50 to 64 years who were discharged without receiving algorithm-selected PAC services (ie, the NS groups) were generally more likely to be rehospitalised within 28 days of discharge than their counterparts who shared similar clinical and acute care utilisation profiles but received such services. In some cases, the 50-64–year-old NS groups were rehospitalised at even higher rates than the ≥65–year-old NS groups. Under these circumstances, the 50-64–year-old NS groups also exhibited higher rates of resource-intensifying co-morbidity. This elevated co-morbidity among patients aged 50-64 years who experienced more frequent rehospitalisation than their ≥65–year-old counterparts was exemplified by NS groups whose clinical and acute care utilisation profiles resembled the CMGs of typical patients with renal failure and COPD—the same CMGs characterising typical patients in the ≥65–year-old NS groups. In the case of COPD, the rates of co-morbidity, chronic illnesses, and rehospitalisation within the full segment could be directly considered when comparing the 50-64–year-old and ≥65–year-old NS groups.
     
    Ambulatory care–sensitive case-mix profiles and preventable rehospitalisation
    Similar to COPD, the majority of typical patients’ CMGs in the full segments and NS groups identified in the present study were considered ambulatory care–sensitive conditions (ACSCs),31 for which hospitalisations are potentially avoidable through timely and effective ambulatory care. Because avoidable hospitalisations among ACSC patients could be prevented with better access to ambulatory and primary care services, it has been argued that resources should be redistributed from acute care to these services.32 Our findings provide rare empirical support for this argument. By comparing rehospitalisation rates among subpopulations of patients with homogeneous clinical profiles and acute care utilisation patterns but differing PAC assignments, we demonstrated, at a population level, the benefits of ambulatory care (eg, specialist follow-up and in-home nursing transitional care) and primary care in reducing rehospitalisation rates among typical patient profiles whose CMGs were ACSCs.
     
    Notably, even ACSCs may progress into more acute diagnoses, with a higher likelihood of co-morbidity and elevated 28-day rehospitalisation rates. For instance, whereas Angina or Arrhythmia were the CMGs of typical patient profiles in the full patient segments of the 50-64–year-old and ≥65–year-old populations, respectively, the CMG of their NS groups’ typical patient profile was Heart Failure; these patients exhibited higher rates of co-morbidities and 28-day rehospitalisation. Similarly, Digestive Malignancy was the CMG of the typical patient profile in a 50-64–year-old NS group, which showed higher rates of co-morbidities and 28-day rehospitalisation than its corresponding full patient segment, whose typical CMG was Symptom or Sign of the Digestive System.
     
    Post-discharge service gaps and policy implications
    Despite such evidence, these services remain largely unavailable for individuals in the studied populations. For example, the average wait time for SOPC appointments ranges from 9 to 111 weeks,33 in sharp contrast to the median interval between discharge and rehospitalisation among NS patients, which is 14 days. Given the constraints on healthcare professional availability in the public sector, reducing SOPC wait times may be challenging. Therefore, by quantifying the benefits of different PAC services for various patient profiles, the findings presented here suggest the need for the following policy actions: (1) procure specialist follow-up services from the private sector and ensure effective public–private service coordination within the parallel public and private tracks of the healthcare system studied; and (2) enhance the provision of less scarce, near-equivalent alternatives available in the community, rather than relying solely on medical specialists.
     
    Multi-morbidity in adults aged 50 to 64 years and the case for multidisciplinary tertiary prevention
    In addition to the higher rehospitalisation rates identified in the present study, typical patient profiles with ACSC CMGs that lacked effective PAC services also exhibited a high prevalence of co-morbidities. The rates of co-morbidities and 28-day rehospitalisations were particularly high among individuals aged 50 to 64 years who fit these patient profiles. This finding aligns with recent studies showing that younger patients with diabetes—also a chronic ACSC—have significantly greater co-morbidities and worse outcomes than their older counterparts.34 Furthermore, we found that younger patients not only have more complex health needs but also benefit less from conventional PAC services and are more likely to be rehospitalised before receiving ambulatory or primary care. This finding is consistent with current literature, which indicates that effective rehospitalisation prevention programmes for chronically ill patients with multiple health problems,35 especially younger patients, require a multidisciplinary approach to address diverse needs such as smoking cessation,36 rather than the conventional ‘assess-and-advise’ primary care model of rehospitalisation prevention.37
     
    Indeed, most evidence supporting the benefits of multidisciplinary primary care for chronic conditions is derived from intervention studies targeting diseases that also represented the CMGs of typical patients identified in our study populations—particularly those who did not receive PAC services deemed effective in reducing 28-day rehospitalisation. For example, multidisciplinary pulmonary rehabilitation programmes, which are most effective in preventing rehospitalisation among patients with COPD, include not only clinician-led physical rehabilitation but also health-related education, advice regarding exercise programmes, targeted interventions addressing cognitive and behavioural issues, and personalised care plans tailored to individual needs.38 39 Similarly, community-based cardiac rehabilitation programmes that integrate cardioprotective therapeutics with psychosocial care and lifestyle management are most effective in preventing rehospitalisation among patients with angina and arrhythmia—conditions that are often underdiagnosed in acute care settings yet associated with high rehospitalisation rates and natural progression to heart failure if left untreated.40 Furthermore, effective pain management programmes for patients with pain-related musculoskeletal conditions—such as the inflammatory and reactive arthropathy CMGs assigned to our typical patient profiles—are multidisciplinary in nature and combine physiotherapy with approaches that promote active coping and self-management.41
     
    Precision-driven tertiary prevention: case management and population stratification
    Patients with multiple chronic health conditions benefit most from multidisciplinary care but often require treatment from numerous healthcare professionals across both primary and secondary care settings. To mitigate the risk of care fragmentation redundant patient assessments, a case management approach has been advocated as a holistic means of addressing the complex needs of such patients (Fig 2). For example, patients with COPD have diverse and evolving care needs throughout their care journey,39 requiring care that is not only multidisciplinary but also integrated through case management. Effective case management for patients with COPD involves healthcare professionals who address the most pressing needs at the initial stage of the care journey assuming the role of case manager, supported by community health practitioners who coordinate other professional services as required.42
     

    Figure 2. Case management
     
    Given the complexity of multidisciplinary care needs in patients with multiple chronic conditions, and the challenge of delivering the right intervention from the right healthcare professionals to the right patients at the right time, the training and provision of case management can be enhanced through a precision-driven approach. By leveraging advanced data analytics and machine learning, such an approach can accurately identify care needs and service gaps to improve the integration of multidisciplinary care.43 44 The approach used in the present study—segmenting patient populations based on diagnostic profiles and patterns of acute and PAC service utilisation through iterative applications of unsupervised and 28-day rehospitalisation outcome–supervised machine learning algorithms—can profile unmet needs and service gaps among patient populations discharged into the community. Thus, our study adds value to a body of literature largely focused on identifying homogeneous inpatient segments solely based on diagnoses45 46 47 48 49 50 51 or cost,52 aimed at improving acute care management.
     
    Limitations
    This study has several limitations. First, the data were solely derived from public hospitals as information from private hospitals and other healthcare providers outside the public system was not accessible. However, it is worth noting that public hospitals account for over 90% of inpatient services. Second, the coding system may not capture all patient health conditions because it mainly focuses on chief complaints. Finally, the lack of socio-demographic data limits the ability to generate more precise predictions.
     
    Conclusion
    This hybrid machine learning analysis of electronic health records of discharged patient population showed that patients aged 50 to 64 years with typical ambulatory care—sensitive case-mix profiles who did not receive algorithm-selected PAC services had substantially higher levels of multimorbidity and increased risk of 28-day rehospitalisation compared with clinically similar peers receiving such care. Integrating PAC utilisation and clinical complexity indicators into case-mix stratification can enable precision tertiary prevention and guide the development of targeted, multidisciplinary, case-managed services in the community.
     
    Author contributions
    Concept or design: E Leung, A Lee, J Guan.
    Acquisition of data: E Leung, J Guan, SCC Ching.
    Analysis or interpretation of data: E Leung, J Guan, SCC Ching.
    Drafting of the manuscript: E Leung, A Lee, FY Chen.
    Critical revision of the manuscript for important intellectual content: All authors.
     
    All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
     
    Conflicts of interest
    As the Chief Editor of the journal, MCS Wong was not involved in the peer review process. Other authors declared no conflicts of interest.
     
    Funding/support
    This research was supported by the Strategic Public Policy Research Funding Scheme of the Hong Kong SAR Government (Project No.: S2019.A4.015.19S) awarded to A Lee and E Leung; the Community Involvement Fund of the Home Affairs Department, Hong Kong SAR Government, through Sham Shui Po District Council (Project Nos.: 220179 and 220180) awarded to E Leung and A Lee; and the General Research Fund of the Research Grants Council of Hong Kong (Project No.: 9043763) awarded to FY Chen. The funders had no role in the study design, data collection/analysis/interpretation, or manuscript preparation.
     
    Ethics approval
    This research was approved by the Joint Chinese University of Hong Kong–New Territories East Cluster Clinical Research Ethics Committee, Hong Kong (Ref No.: SBRE-22-0386). The requirement for patient consent was waived by the Committee due to the use of unidentifiable information of participants in the research.
     
    Supplementary material
    The supplementary material was provided by the authors and some information may not have been peer reviewed. Accepted supplementary material will be published as submitted by the authors, without any editing or formatting. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by the Hong Kong Academy of Medicine and the Hong Kong Medical Association. The Hong Kong Academy of Medicine and the Hong Kong Medical Association disclaim all liability and responsibility arising from any reliance placed on the content.
     
    References
    1. Vuik SI, Mayer E, Darzi A. A quantitative evidence base for population health: applying utilization-based cluster analysis to segment a patient population. Popul Health Metr 2016;14:44. Crossref
    2. Caldon LJ, Walters SJ, Reed JA, Murphy A, Worley A, Reed MW. Case-mix fails to explain variation in mastectomy rates: management of screen-detected breast cancer in a UK region 1997–2003. Br J Cancer 2005;92:55-9. Crossref
    3. Hof S, Fügener A, Schoenfelder J, Brunner JO. Case mix planning in hospitals: a review and future agenda. Health Care Manag Sci 2017;20:207-20. Crossref
    4. Şentürk D, Chen Y, Estes JP, et al. Impact of case-mix measurement error on estimation and inference in profiling of health care providers. Commun Stat Simul Comput 2020;49:2206-24. Crossref
    5. Tumlinson A, Altman W, Glaudemans J, Gleckman H, Grabowski DC. Post–acute care preparedness in a COVID-19 world. J Am Geriatr Soc 2020;68:1150-4. Crossref
    6. Lee MC, Wu TY, Huang SJ, Chen YM, Hsiao SH, Tsai CY. Post–acute care for frail older people decreases 90-day emergency room visits, readmissions and mortality: an interventional study. PLoS One 2023;18:e0279654. Crossref
    7. Jamei M, Nisnevich A, Wetchler E, Sudat S, Liu E. Predicting all-cause risk of 30-day hospital readmission using artificial neural networks. PLoS One 2017;12:e0181173. Crossref
    8. Siddique SM, Tipton K, Leas B, et al. Interventions to reduce hospital length of stay in high-risk populations: a systematic review. JAMA Netw Open 2021;4:e2125846. Crossref
    9. McGilton KS, Vellani S, Krassikova A, et al. Understanding transitional care programs for older adults who experience delayed discharge: a scoping review. BMC Geriatr 2021;21:210. Crossref
    10. Cao YJ, Wang Y, Mullahy J, Burns M, Liu Y, Smith M. The relative importance of hospital discharge and patient composition in changing post–acute care utilization and outcomes among Medicare beneficiaries. Health Serv Insights 2023;16:11786329231166522. Crossref
    11. White HK. Post–acute care: current state and future directions. J Am Med Dir Assoc 2019;20:392-5. Crossref
    12. Geng F, Liu Z, Yan R, Zhi M, Grabowski DC, Hu L. Post–acute care in China: development, challenges, and path forward. J Am Med Dir Assoc 2023;25:61-8. Crossref
    13. Canadian Institute for Health Information. Case Mix Decision-Support Guide: CMG+. Ottawa: Canadian Institute for Health Information; 2015.
    14. Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 2006;15:651-74. Crossref
    15. Guan J, Leung E, Kwok KO, Chen FY. A hybrid machine learning framework to improve prediction of all-cause rehospitalization among elderly patients in Hong Kong. BMC Med Res Methodol 2023;23:14. Crossref
    16. Choi NG, DiNitto DM, Choi BY. Unmet healthcare needs and healthcare access gaps among uninsured U.S. adults aged 50-64. Int J Environ Res Public Health 2020;17:2711. Crossref
    17. Hof S, Fügener A, Schoenfelder J, Brunner JO. Case mix planning in hospitals: a review and future agenda. Health Care Manag Sci 2017;20:207-20. Crossref
    18. Fetter RB, Shin Y, Freeman JL, Averill RF, Thompson JD. Case mix definition by diagnosis-related groups. Med Care 1980;18(2 Suppl):iii, 1-53.
    19. Centers for Medicare & Medicaid Services (CMS), HHS. Medicare program; changes to the hospital inpatient prospective payment systems and fiscal year 2008 rates. Fed Regist 2007;72:47130-8175.
    20. Fries BE, Schneider DP, Foley WJ, Gavazzi M, Burke R, Cornelius E. Refining a case-mix measure for nursing homes: Resource Utilization Groups (RUG-III). Med Care 1994;32:668-85. Crossref
    21. Pope GC, Kautter J, Ellis RP, et al. Risk adjustment of Medicare capitation payments using the CMS-HCC model. Health Care Financ Rev 2004;25:119-41.
    22. Aschengrau A, Seage GR. Essentials of Epidemiology in Public Health. 3rd edition. Burlington [MA]: Jones & Bartlett Publishers; 2013.
    23. Tripepi G, Jager KJ, Dekker FW, Zoccali C. Stratification for confounding—part 1: the Mantel-Haenszel formula. Nephron Clin Pract 2010;116:c317-21. Crossref
    24. Shrier I, Pang M. Confounding, effect modification, and the odds ratio: common misinterpretations. J Clin Epidemiol 2015;68:470-4. Crossref
    25. Knol MJ, Le Cessie S, Algra A, Vandenbroucke JP, Groenwold RH. Overestimation of risk ratios by odds ratios in trials and cohort studies: alternatives to logistic regression. CMAJ 2012;184:895-9. Crossref
    26. Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 1959;22:719-48.
    27. Sudhakar SK, Sridhar S, Char S, Pandya K, Mehta K. Prevalence of comorbidities post mild traumatic brain injuries: a traumatic brain injury model systems study. Front Hum Neurosci 2023;17:1158483. Crossref
    28. Lloyd T, Deeny SR, Steventon A. Weekend admissions may be associated with poorer recording of long-term comorbidities: a prospective study of emergency admissions using administrative data. BMC Health Serv Res 2018;18:863. Crossref
    29. le Cessie S, Nagelkerke N, Rosendaal FR, van Stralen KJ, Pomp ER, van Houwelingen HC. Combining matched and unmatched control groups in case-control studies. Am J Epidemiol 2008;168:1204-10. Crossref
    30. Caliński T, Harabasz J. A dendrite method for cluster analysis. Commun Stat 1974;3:1-27. Crossref
    31. Lin PJ, Zhong Y, Fillit HM, Cohen JT, Neumann PJ. Hospitalizations for ambulatory care sensitive conditions and unplanned readmissions among Medicare beneficiaries with Alzheimer’s disease. Alzheimers Dement 2017;13:1174-8. Crossref
    32. Grabowski DC, Mor V. Nursing home care in crisis in the wake of COVID-19. JAMA 2020;324:23-4. Crossref
    33. Hospital Authority. Waiting time for stable new case booking at specialist out-patient clinics. 2024. Available from: https://www.ha.org.hk/haho/ho/sopc/dw_wait_ls.pdf. Accessed 5 Jul 2024.
    34. Hong SN, Mak IL, Chin WY, et al. Age-specific associations between the number of co-morbidities, all-cause mortality and public direct medical costs in patients with type 2 diabetes: a retrospective cohort study. Diabetes Obes Metab 2023;25:454-67. Crossref
    35. Fong BY, Law VT, Lee A. Primary Care Revisited: Interdisciplinary Perspectives for a New Era. Singapore: Springer; 2020. Crossref
    36. Al Quait A, Doherty P. Does cardiac rehabilitation favour the young over the old? Open Heart 2016;3:e000450. Crossref
    37. Lorig K, Holman H, Sobel D, Laurent D. Living a Healthy Life with Chronic Conditions: Self-management of Heart Disease, Arthritis, Diabetes, Asthma, Bronchitis, Emphysema and Others. 3rd edition. Boulder [CO]: Bull Publishing Company; 2006.
    38. Bourbeau J, Julien M, Maltais F, et al. Reduction of hospital utilization in patients with chronic obstructive pulmonary disease: a disease-specific self-management intervention. Arch Intern Med 2003;163:585-91. Crossref
    39. Cravo A, Attar D, Freeman D, Holmes S, Ip L, Singh SJ. The importance of self-management in the context of personalized care in COPD. Int J Chron Obstruct Pulmon Dis 2022;17:231-43. Crossref
    40. Dalal HM, Doherty P, Taylor RS. Cardiac rehabilitation. BMJ 2015;351:h5000. Crossref
    41. Soares JJ, Sundin O, Grossi G. The stress of musculoskeletal pain: a comparison between primary care patients in various ages. J Psychosom Res 2004;56:297-305. Crossref
    42. Tong KW, Fong KN. Community Care in Hong Kong: Current Practices, Practice-Research Studies and Future Directions. Hong Kong: City University of Hong Kong Press; 2014.
    43. Talias MA, Lamnisos D, Heraclides A. Editorial: Data science and health economics in precision public health. Front Public Health 2022;10:960282. Crossref
    44. Leung E, Lee A, Tsang H, Wong MC. Data-driven service model to profile healthcare needs and optimise the operation of community-based care: a multi-source data analysis using predictive artificial intelligence. Hong Kong Med J 2023;29:484-6. Crossref
    45. Chong JL, Lim KK, Matchar DB. Population segmentation based on healthcare needs: a systematic review. Syst Rev 2019;8:202. Crossref
    46. Mechanic R. Post–acute care—the next frontier for controlling Medicare spending. N Engl J Med 2014;370:692-4. Crossref
    47. Nnoaham KE, Cann KF. Can cluster analyses of linked healthcare data identify unique population segments in a general practice-registered population? BMC Public Health 2020;20:798. Crossref
    48. Lafortune L, Béland F, Bergman H, Ankri J. Health status transitions in community-living elderly with complex care needs: a latent class approach. BMC Geriatr 2009;9:6. Crossref
    49. Liu LF, Tian WH, Yao HP. Utilization of health care services by elderly people with National Health Insurance in Taiwan: the heterogeneous health profile approach. Health Policy 2012;108:246-55. Crossref
    50. Eissens van der Laan MR, van Offenbeek MA, Broekhuis H, Slaets JP. A person-centred segmentation study in elderly care: towards efficient demand-driven care. Soc Sci Med 2014;113:68-76. Crossref
    51. Joynt KE, Figueroa JF, Beaulieu N, Wild RC, Orav EJ, Jha AK. Segmenting high-cost Medicare patients into potentially actionable cohorts. Healthc (Amst) 2017;5:62-7. Crossref
    52. Davis AC, Shen E, Shah NR, et al. Segmentation of high-cost adults in an integrated healthcare system based on empirical clustering of acute and chronic conditions. J Gen Intern Med 2018;33:2171-9. Crossref

    Specific indicators of unsuitability for transarterial chemoembolisation in patients with intermediate-stage hepatocellular carcinoma according to thresholds of tumour burden and liver function as judged by survival benefit over sorafenib

    Hong Kong Med J 2025 Dec;31(6):453–61 | Epub 5 Dec 2025
    © Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
     
    ORIGINAL ARTICLE
    Specific indicators of unsuitability for transarterial chemoembolisation in patients with intermediate-stage hepatocellular carcinoma according to thresholds of tumour burden and liver function as judged by survival benefit over sorafenib
    LM Chen, PhD1,2,3; Simon CH Yu, MB, BS, MD1; Leung Li, MB, ChB, MD4; Edwin P Hui, MB, ChB, MD4; Winnie Yeo, MB, BS, MD4,5; Stephen L Chan, MB, BS, MD4,5
    1 Department of Imaging and Interventional Radiology, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
    2 Department of Medical Ultrasonics, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
    3 Biomedical Innovation Center, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
    4 Department of Clinical Oncology, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, China
    5 State Key Laboratory of Translational Oncology, China
     
    Corresponding author: Dr Simon CH Yu (simonyu@cuhk.edu.hk)
     
     Full paper in PDF
     
    Abstract
    Introduction: This study aimed to define specific indicators of unsuitability for transarterial chemoembolisation (TACE) in patients with intermediate-stage hepatocellular carcinoma (HCC) in Hong Kong using thresholds of tumour burden and liver function, as judged by survival benefit over sorafenib.
     
    Methods: Patients with treatment-naïve and unresectable HCC who received TACE or sorafenib from 2005 to 2019 and met the eligibility criteria were enrolled. Overall survival (OS) was compared between the TACE and sorafenib groups using the log-rank test and hazard ratios (HRs) in all subgroups classified according to baseline modified albumin–bilirubin (mALBI) grade and tumour burden, including the up-to-7, up-to-11, and N3-S5-S10 criteria.
     
    Results: Overall survival was significantly longer in TACE subgroups than in sorafenib subgroups when stratified by mALBI grade and either the up-to-7 or the up-to-11 criteria (all P<0.05). When applying the N3-S5-S10 criteria, OS did not significantly differ between the TACE and sorafenib groups in subgroups with mALBI grade 2b and tumours with number >3 and size >5 cm but ≤10 cm, or tumours with number >3 and size >10 cm (HR=0.550 and 0.965, respectively; both P>0.05). Sensitivity analysis showed non-significant survival benefits in two additional subgroups: those with mALBI grade 2b and tumours with number ≤3 and size >10 cm, and those with mALBI grade 1 or 2a and tumours with number >3 and size >10 cm (HR=0.474 and 0.418, respectively; both P>0.05).
     
    Conclusion: More precise criteria for TACE unsuitability are required. The combination of mALBI grade and the N3-S5-S10 criteria may better identify patients with intermediate-stage HCC who are unlikely to benefit from TACE. Validation in a larger cohort is warranted.
     
     
    New knowledge added by this study
    • Patients regarded as unsuitable for transarterial chemoembolisation (TACE) under existing criteria may achieve better survival outcomes with TACE than those with systemic therapy.
    • To determine true TACE unsuitability, more precise criteria based on clinical evidence demonstrating improved survival with alternative treatments are required. Modified albumin–bilirubin (mALBI) grade 2b and tumours with number >3 and size >5 cm, or tumours with number ≤3 and size >10 cm, as well as mALBI grade 1 or 2a and tumours with number >3 and size >10 cm, could serve as better indicators of TACE unsuitability in patients with intermediate-stage hepatocellular carcinoma.
    Implications for clinical practice or policy
    • Within the framework of TACE unsuitability, the use of more precise discriminatory criteria is crucial to ensure that patients are not inappropriately excluded from the potential benefits of TACE.
    • The integration of mALBI grade with the N3-S5-S10 tumour burden criteria may offer a practical framework for clinicians to individualise treatment selection, optimising outcomes by identifying patients more likely to benefit from TACE versus systemic therapy.
     
     
    Introduction
    Hepatocellular carcinoma (HCC) is one of the leading malignancies worldwide. At diagnosis, up to 30% of patients have intermediate-stage HCC according to the Barcelona Clinic Liver Cancer system.1 Transarterial chemoembolisation (TACE) has emerged as the first-line treatment for intermediate-stage HCC, supported by two randomised controlled trials2 3 and a meta-analysis4 that demonstrated superior survival outcomes compared with best supportive care or suboptimal therapies.
     
    Because patients with intermediate-stage HCC comprise a heterogeneous group characterised by a wide range of tumour burdens and liver function, the effectiveness of TACE as first-line treatment may not be universal, particularly in subgroups with high tumour burden or suboptimal liver function. To address this issue, sub-staging of intermediate-stage HCC based on tumour burden and liver function has been proposed in several criteria, including the Bolondi,5 Kinki,6 and MICAN (Modified Intermediate Stage of Liver Cancer) criteria.7 These criteria have demonstrated discriminative prognostic value in identifying subgroups of patients with intermediate-stage HCC.7 8 Given that survival outcomes of patients treated with TACE can vary across substages of intermediate-stage HCC, it is clinically essential to identify thresholds of tumour burden and liver function that preclude the use of TACE according to survival benefit.
     
    Sorafenib has been established as the standard of care for advanced HCC since 2007, based on the demonstration of its significant survival superiority over placebo.9 10 11 Subgroup analyses of clinical trials have shown that sorafenib exerts positive therapeutic efficacy in intermediate-stage HCC, with reported overall survival (OS) ranging from 14.5 to 20.6 months,9 12 13 which is comparable to the OS achieved with TACE. Sorafenib treatment can serve as a benchmark for evaluating the survival benefit of TACE. If TACE does not provide a significant survival benefit compared with sorafenib, it may not be appropriate to subject patients to TACE rather than systemic therapy, given that TACE is invasive and potentially harmful to the liver. Patients may benefit from systemic therapy before liver function becomes suboptimal.
     
    It has been hypothesised that specific baseline parameters of tumour burden and liver function, at which TACE fails to show superior survival benefit compared with sorafenib, could be defined as indicators of TACE unsuitability. This study aimed to define specific indicators of TACE unsuitability at baseline in patients with intermediate-stage HCC according to thresholds of tumour burden and liver function, as judged by the survival benefit of TACE over sorafenib.
     
    Methods
    Study design
    Due to the limited number of eligible participants, all available cases with complete clinical data were included. All patients with unresectable HCC who received TACE or sorafenib therapy from January 2005 to December 2019 at Prince of Wales Hospital were enrolled in the study, provided they met all eligibility criteria. Unresectability of intermediate-stage HCC was determined by a multidisciplinary team comprising a surgeon, an interventional radiologist, and an oncologist. Inclusion criteria were treatment-naïve, Barcelona Clinic Liver Cancer-B stage HCC diagnosed by biopsy or a typical vascular pattern on cross-sectional imaging; intrahepatic disease without vascular invasion; and an Eastern Cooperative Oncology Group performance status score of 0 or 1. Exclusion criteria included age under 18 years or Eastern Cooperative Oncology Group performance status score of 2 or above; prior treatment before initial TACE; receipt of hepatectomy, liver transplantation, or local therapy after initial TACE; and any imaging evidence from computed tomography (CT), magnetic resonance imaging, or positron emission tomography/CT showing vascular invasion by tumour (including portal vein tumour thrombus) or extrahepatic metastasis (Fig 1). To identify thresholds for TACE unsuitability, OS of patients treated with TACE was compared with that of patients treated with sorafenib within subgroups defined by baseline tumour burden and liver function. Overall survival was defined as the interval between the initiation of TACE or sorafenib and death from any cause. Patients who were alive or lost to follow-up were censored.
     

    Figure 1. Study recruitment and patient subgrouping for transarterial chemoembolisation
     
    Study participants
    In total, 420 patients were enrolled in the study: 358 received TACE and 62 received sorafenib (Table 1). The TACE group included significantly more older and female patients. The median tumour size was significantly larger in the sorafenib group compared with the TACE group. No significant differences were observed between the two groups in terms of the modified albumin–bilirubin (mALBI) grade distribution or tumour multiplicity. Among patients initially treated with TACE, the median number of TACE sessions was two (range, 1-4); 124 patients received one session, 78 received two sessions, 53 received three sessions, and 103 received more than three sessions. After developing refractoriness to TACE, 60 patients subsequently received systemic agents; of these, 35 received sorafenib, eight received adriamycin, four received doxorubicin, six received lenvatinib, and seven received other agents.
     

    Table 1. Demographics of patients (n=420)
     
    Patient subgrouping
    Patients were classified into six subgroups according to baseline tumour burden and liver function. Tumour burden was subcategorised using the up-to-7, up-to-11, and N3-S5-S10 criteria. The up-to-7 and up-to-11 criteria were derived from the sum of the maximum tumour size (in cm) and the tumour number, with cut-off values of 7 or 11, respectively. Accordingly, patients were categorised as within or beyond the up-to-7 and up-to-11 criteria. In the N3-S5-S10 system, tumour burden was subcategorised according to the combination of tumour number and maximum tumour size; three tumour nodules and 5 cm or 10 cm in size served as the respective cut-off values. This categorisation resulted in the following six subgroups: (1) tumour number ≤3, tumour size ≤5 cm; (2) tumour number ≤3, tumour size >5 cm to ≤10 cm; (3) tumour number ≤3, tumour size >10 cm; (4) tumour number >3, tumour size ≤5 cm; (5) tumour number >3, tumour size >5 cm to ≤10 cm; and (6) tumour number >3, tumour size >10 cm (Fig 1).
     
    Liver function subgroups were classified according to the mALBI grade.14 The mALBI grades were determined using the ALBI score, calculated as (log10 [bilirubin level (μmol/L)] × 0.66) + (albumin level [g/L] × –0.085). Based on three cut-off ALBI scores, grades were defined as follows: grade 1 (≤–2.60), grade 2a (>–2.60 to ≤–2.27), grade 2b (>–2.27 to ≤–1.39), and grade 3 (>–1.39). Because the sample size of patients receiving sorafenib with mALBI grade 1 or 2a was relatively small, these two subgroups were combined for analysis. Additionally, given that no patient with mALBI grade 3 received sorafenib, this subgroup was excluded from the analysis (Fig 1).
     
    Transarterial chemoembolisation
    The TACE procedures were performed using digital subtraction angiography equipment via a femoral approach under local anaesthesia.15 16 In brief, a microcatheter was used to catheterise tumour-feeding arteries at the lobar, segmental, or subsegmental level, depending on tumour size. An emulsion of cisplatin–ethiodised oil (Platosin; Pharmachemie BV, Haarlem, the Netherlands), consisting of up to 20 mg aqueous cisplatin (20 mL) and up to 20-mL ethiodised oil mixed in a 1:1 volume ratio, was administered until flow stasis occurred or a maximum dose of 40-mL emulsion was delivered. Digital subtraction angiography, with or without non-contrast multiplanar CT, was used to confirm treatment completeness. A gelatin sponge (5-10 mL) was used to embolise the feeding arteries.
     
    Postoperative monitoring included blood tests for liver function and tumour markers within 2 days, at 2 weeks, and then every 1 to 3 months, as well as CT imaging every 3 months. Systemic therapy was administered to patients with well-preserved liver function who developed TACE refractoriness, as indicated by continuous elevation of tumour markers and CT evidence of tumour progression.
     
    Systemic therapy
    According to the customary protocol at Prince of Wales Hospital, The Chinese University of Hong Kong during the study period, patients with unresectable intermediate-stage HCC and no contraindications to TACE were prioritised for TACE treatment. Patients who declined TACE were treated with sorafenib; as a result, some patients in the sorafenib group had smaller tumours or fewer tumour nodules. Sorafenib was administered orally at a prescribed dose of 400 mg twice daily. In the event of intolerable side-effects or serious adverse events, oncologists could adjust the treatment by reducing the dose or discontinuing the drug.
     
    Statistical analysis
    Categorical variables were presented as numbers (percentages), while continuous variables were summarised as median (interquartile range), median (95% confidence interval [95% CI]), or depending on the results of normality testing. The Chi squared test was used to compare categorical data, and the Mann-Whitney U test was performed for continuous data. Kaplan-Meier curves and Cox proportional hazards models were used to compare OS values among subgroups. The log-rank test and hazard ratio (HR) were utilised to assess survival differences between subgroups. A sensitivity analysis of survival outcomes was conducted, excluding participants who received systemic therapy after TACE. A P value <0.05 was considered statistically significant. Statistical analyses were performed using SPSS (Windows version 25.0; IBM Corp, Armonk [NY], United States).
     
    Results
    Comparison of overall survival between transarterial chemoembolisation and sorafenib
    The median OS of all patients who received TACE was significantly longer than that of patients who received sorafenib (19.37 [16.89-21.85] months vs 5.12 [4.37-5.84] months, P<0.001; Fig 2a). When stratified by mALBI grade, patients with mALBI grade 1 or 2a had significantly longer median OS in the TACE group compared with the sorafenib group (23.83 [18.53-29.13] months vs 6.60 [3.61-9.59] months, P<0.001; Fig 2b). Similarly, patients with mALBI grade 2b had significantly longer median OS in the TACE group than in the sorafenib group (16.20 [11.91-20.49] months vs 4.39 [3.44-5.35] months, P<0.001; Fig 2c).
     

    Figure 2. Kaplan-Meier overall survival curves for patients with hepatocellular carcinoma who received transarterial chemoembolisation (TACE) and sorafenib. The median overall survival of all patients who received TACE was significantly longer than that of those who received sorafenib (a). Transarterial chemoembolisation subgroups were associated with significantly longer survival compared with sorafenib subgroups in patients with modified albumin–bilirubin (mALBI) grade 1 or 2a (b) and in those with mALBI grade 2b (c)
     
    Overall survival by modified albumin–bilirubin grade and tumour burden in sorafenib-treated patients
    The median OS of patients treated with sorafenib, stratified by mALBI grade and tumour burden, is summarised in Table 2. As the sorafenib subgroups with tumour number ≤3 had a relatively small sample size (n=8) according to the N3-S5-S10 criteria, these patients were not further subdivided based on tumour size. Instead, they were combined into a single subgroup with tumour number ≤3 to increase the sample size for comparison with the TACE group. Consequently, OS in the combined sorafenib subgroup (tumour number ≤3, any tumour size) was used for comparison with OS in the three tumour-size TACE subgroups of tumour number ≤3 (Table 2).
     

    Table 2. Overall survival of patients receiving sorafenib (n=62)
     
    The distribution of sample sizes was uneven across the sorafenib subgroups with tumour number >3 based on the N3-S5-S10 criteria, which may have introduced bias in the survival outcomes, such as a lower tumour burden being associated with worse OS. To avoid underestimation of OS in any tumour-size subgroup when comparing with the TACE subgroups, the longest OS among the subgroups with tumour number >3 was utilised as the OS value for all these subgroups in the analysis, irrespective of tumour size (Table 2). As no patients with mALBI grade 2 were present in the tumour burden subgroup defined as within up-to-7, the OS of patients with tumour burden beyond up-to-7 (Table 2) who were treated with sorafenib was used as the control.
     
    Overall survival in modified albumin–bilirubin grade 1 or 2a: transarterial chemoembolisation versus sorafenib
    Table 3 presents the median OS of patients treated with TACE or sorafenib, stratified by mALBI grade 1 or 2a and tumour burden. Across all subgroups defined by various tumour burden criteria, patients who received TACE achieved significantly longer OS than those who received sorafenib (all P<0.05), with HRs favouring TACE (ranging from 0.130 to 0.331). Sensitivity analysis showed that survival was not significantly different between TACE and sorafenib in the subgroup with tumour number >3 and tumour size >10 cm (HR=0.418 [95% CI=0.147-1.171]; P=0.097).
     

    Table 3. Overall survival of patients with liver function classified as modified albumin–bilirubin grade 1 or 2a
     
    Overall survival in modified albumin–bilirubin grade 2b: transarterial chemoembolisation versus sorafenib
    In subgroups with mALBI grade 2b, defined by either the up-to-7 or up-to-11 criteria, patients who received TACE exhibited significantly longer median OS than those who received sorafenib across all subgroups (all P<0.05; Table 4). However, when using the N3-S5-S10 criteria, TACE resulted in a significantly longer median OS than sorafenib only in the subgroups with tumour number ≤3 (any tumour size) and in the subgroup with tumour number >3 and tumour size ≤5 cm (both P<0.05; Table 4). In the subgroups with tumour number >3 and tumour size >5 cm to ≤10 cm, and those with tumour number >3 and tumour size >10 cm, although TACE subgroups demonstrated longer median OS than sorafenib subgroups (6.07 vs 3.74 months and 7.73 vs 3.74 months, respectively), the differences were not statistically significant (Table 4). Sensitivity analysis showed that survival was also not significantly different between TACE and sorafenib in the additional subgroup with tumour number ≤3 and tumour size >10 cm (HR=0.474 [95% CI=0.185-1.261]; P=0.120).
     

    Table 4. Overall survival of patients with liver function classified as modified albumin–bilirubin grade 2b
     
    Due to the small sample size, it was difficult to demonstrate a clear survival benefit of TACE over sorafenib; thus, the risk of overestimating the survival benefit of TACE, due to potential bias from more advanced disease in the sorafenib group, was likely minimised. For example, given the limited number of patients in the subgroups with tumour number >3 and tumour size >5 cm to ≤10 cm and those with tumour number >3 and tumour size >10 cm, these two subgroups were combined into one subgroup (tumour number >3 and tumour size >5 cm). In this combined subgroup, TACE (n=38) still yielded no significant survival benefit over sorafenib (n=14), with OS values of 6.07 months (4.10-8.03) and 3.74 months (1.71-5.78), respectively (HR=0.586 [95% CI=0.325-1.054]; P=0.071).
     
    Discussion
    Results of subgroup analysis
    Subgroup analysis in this study revealed that, within the limitations of the data, TACE probably did not confer a statistically significant survival benefit over sorafenib for patients with mALBI grade 2b and a high tumour burden (number >3 and size >5 cm, or number ≤3 and size >10 cm), or for patients with mALBI grade 1 or 2a and tumour burden of number >3 and size >10 cm. In contrast, TACE did provide a survival benefit when the beyond up-to-7 or beyond up-to-11 criteria were applied. These findings suggest that the use of more precise criteria to define tumour burden and liver function could help identify specific subgroups unsuitable for TACE. Such criteria highlight the threshold at which TACE no longer provides a survival advantage over sorafenib, thereby indicating TACE unsuitability. These indicators would be valuable in guiding the clinical management of intermediate-stage HCC. The small sample size in the sorafenib group may have limited the statistical power to detect a survival benefit of TACE in subgroups with tumour number >3 and size >5 cm. Given that the overall results showed a consistent trend favouring TACE, validation through further studies with larger sample sizes is warranted.
     
    Sorafenib as a control
    In recent years, systemic therapy for HCC has undergone rapid development, leading to the emergence of new drugs after sorafenib. The combination of certain agents has shown significant improvements in survival compared with sorafenib alone. The IMbrave150 study demonstrated that treatment with atezolizumab plus bevacizumab resulted in a significantly longer median OS than sorafenib alone (19.2 vs 13.4 months).17 Similarly, both sintilimab plus a bevacizumab biosimilar18 and tremelimumab plus durvalumab19 provided significant survival benefits over sorafenib in patients with unresectable HCC. Nevertheless, sorafenib remains the first-line standard treatment and the most effective single agent for advanced HCC. It serves as a benchmark for newer single-agent therapies such as lenvatinib, nivolumab, and durvalumab, which have shown statistical non-inferiority in survival compared with sorafenib.19 20 21 Therefore, the use of sorafenib as the control arm versus TACE in this study is reasonable. With the rapid advancement of systemic agents, novel treatment strategies—such as switching to systemic therapy22 or initiating systemic therapy upfront followed by curative conversion23—have been advocated for patients with intermediate-stage HCC who may not benefit from TACE or repeated TACE. In such cases, it is important to define specific indicators of TACE unsuitability among patients with intermediate-stage HCC, in whom systemic therapy may potentially improve survival.
     
    Deficiencies of conventional criteria of unsuitability for transarterial chemoembolisation
    The concept of TACE unsuitability has emerged in conjunction with the development and availability of systemic therapies.24 In patients with intermediate-stage HCC, TACE unsuitability has been defined as the presence of mALBI grade 2b and tumour burden beyond the up-to-7 criteria.25 26 This definition was based on worse survival in patients with mALBI grade 2b and the beyond up-to-7 criteria relative to patients displaying better liver function and lower tumour burden, without addressing the potential survival benefit of TACE over alternative treatment options in this subgroup. However, this definition has two key limitations. First, it lacks clinical evidence demonstrating greater survival benefit from other alternative treatments when TACE is withheld. Second, there remains controversy regarding the optimal criteria for defining high tumour burden. If the beyond up-to-7 criteria is used as the criterion for TACE unsuitability, the majority of patients with intermediate-stage HCC would be considered unsuitable, which is both unrealistic and unsupported. In the present study, 79% of patients had high tumour burden beyond up-to-7, comparable to the 70% reported by Hung et al.27
     
    Limitations of conventional sub-staging systems
    The sub-staging system using the up-to-11 criteria has shown better discriminatory power than the up-to-7 criteria for predicting survival after TACE.28 29 Nonetheless, in this study, neither the up-to-7 nor the up-to-11 criteria were able to identify TACE unsuitability. The findings indicated that both the patient subgroup with mALBI grade 2b and tumour burden beyond the up-to-7 criteria, as well as the subgroup with mALBI grade 2b and tumour burden beyond the up-to-11 criteria, still derived survival benefits from TACE compared with sorafenib, indicating that these subgroups should not be considered TACE unsuitable. The lack of discriminatory power may be attributed to the persistently high heterogeneity among patients classified as having high tumour burden under to these two criteria. Worse survival after TACE in these subgroups, compared with patients displaying better liver function and lower tumour burden, does not justify entirely abandoning TACE in these patients.
     
    We propose using the N3-S5-S10 criteria to define tumour burden, as these criteria allow for more specific subgrouping and enable the identification of TACE unsuitability with greater precision, thereby reducing the likelihood of denying patients a potentially beneficial treatment (TACE). Our findings demonstrate that the proposed criteria can identify TACE unsuitability precisely in specific subgroups where the up-to-7 or up-to-11 criteria fail to distinguish survival differences. Based on these findings, we recommend that physicians assess intermediate-stage HCC using both the mALBI grade and the N3-S5-S10 criteria—a more rigorous framework—to determine TACE unsuitability. To our knowledge, this is the first study to demonstrate the survival benefit of TACE over sorafenib in patients with intermediate-stage HCC stratified by both liver function and tumour burden, as well as to identify TACE unsuitability within these subgroups.
     
    Limitations
    This study provided a larger sample size than previous studies comparing survival benefits between TACE and sorafenib. However, several limitations should be noted. First, the retrospective design of this study inevitably introduced patient selection bias between the TACE and sorafenib groups. Although there were significant differences in age, sex, and tumour size between the groups, such disparities in overall patient demographics might not have critically affected the validity of the survival comparisons, given that these were based on subgroup analyses. Second, the sample size was exceedingly small in some sorafenib subgroups with low tumour burden. The substantial disparity in patient numbers may have contributed to non-significant differences in OS between subgroups. We attempted to mitigate this limitation by combining subgroups with very small sample sizes. Third, some patients in the TACE group received systemic therapy after disease progression. Consequently, survival in the TACE group may have been overestimated as it reflected outcomes of TACE with or without systemic therapy, rather than TACE alone. Nonetheless, ‘TACE followed by systemic therapy’ represents standard clinical practice aimed at achieving the greatest patient benefit, and isolating a TACE-alone group for analysis would not be realistic. Notably, ‘TACE followed by systemic therapy’ accurately reflects real-world treatment practice and does not conflict with the study’s primary objective, which was to define specific indicators of TACE unsuitability at baseline rather than at the point when TACE becomes unsuitable. Finally, no power calculation was performed in the statistical analysis.
     
    Conclusion
    More precise criteria for TACE unsuitability are required. The combination of mALBI grade and N3-S5-S10 criteria may serve as a better indicator of TACE unsuitability than the beyond up-to-7 or beyond up-to-11 criteria for patients with intermediate-stage HCC. TACE likely offers no survival benefit compared with sorafenib beyond these thresholds. However, validation in a larger cohort is warranted.
     
    Author contributions
    Concept or design: SCH Yu.
    Acquisition of data: LM Chen, L Li, EP Hui, W Yeo, SL Chan.
    Analysis or interpretation of data: LM Chen, SCH Yu.
    Drafting of the manuscript: LM Chen, SCH Yu.
    Critical revision of the manuscript for important intellectual content: All authors.
     
    All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
     
    Conflicts of interest
    All authors have disclosed no conflicts of interest.
     
    Funding/support
    This research was funded by the Vascular and Interventional Radiology Foundation, Hong Kong. The funding body was not involved in the design of the study, collection of data, analysis/interpretation of data, or writing of the manuscript.
     
    Ethics approval
    This research was approved by The Chinese University of Hong Kong–New Territories East Cluster Ethics Committee, Hong Kong (Ref No.: 2020.672). It was conducted in accordance with the Declaration of Helsinki and the International Conference on Harmonisation–Good Clinical Practice guidelines. The requirement for written informed patient consent was waived by the Committee due to the retrospective nature of the research.
     
    References
    1. Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71:209-49. Crossref
    2. Llovet JM, Real MI, Montaña X, et al. Arterial embolisation or chemoembolisation versus symptomatic treatment in patients with unresectable hepatocellular carcinoma: a randomised controlled trial. Lancet 2002;359:1734-9. Crossref
    3. Lo CM, Ngan H, Tso WK, et al. Randomized controlled trial of transarterial lipiodol chemoembolization for unresectable hepatocellular carcinoma. Hepatology 2002;35:1164-71. Crossref
    4. Llovet JM, Bruix J. Systematic review of randomized trials for unresectable hepatocellular carcinoma: chemoembolization improves survival. Hepatology Feb 2003;37:429-42. Crossref
    5. Bolondi L, Burroughs A, Dufour JF, et al. Heterogeneity of patients with intermediate (BCLC B) hepatocellular carcinoma: proposal for a subclassification to facilitate treatment decisions. Semin Liver Dis 2012;32:348-59. Crossref
    6. Kudo M, Arizumi T, Ueshima K, Sakurai T, Kitano M, Nishida N. Subclassification of BCLC B stage hepatocellular carcinoma and treatment strategies: proposal of modified Bolondi’s subclassification (Kinki criteria). Dig Dis 2015;33:751-8. Crossref
    7. Hiraoka A, Kumada T, Nouso K, et al. Proposed new sub-grouping for intermediate-stage hepatocellular carcinoma using albumin–bilirubin grade. Oncology 2016;91:153-61. Crossref
    8. Arizumi T, Ueshima K, Iwanishi M, et al. Validation of Kinki criteria, a modified substaging system, in patients with intermediate stage hepatocellular carcinoma. Dig Dis 2016;34:671-8. Crossref
    9. Bruix J, Raoul JL, Sherman M, et al. Efficacy and safety of sorafenib in patients with advanced hepatocellular carcinoma: subanalyses of a phase III trial. J Hepatol 2012;57:821-9. Crossref
    10. Cheng AL, Kang YK, Chen Z, et al. Efficacy and safety of sorafenib in patients in the Asia-Pacific region with advanced hepatocellular carcinoma: a phase III randomised, double-blind, placebo-controlled trial. Lancet Oncol 2009;10:25-34. Crossref
    11. Llovet JM, Ricci S, Mazzaferro V, et al. Sorafenib in advanced hepatocellular carcinoma. N Engl J Med 2008;359:378-90. Crossref
    12. Iavarone M, Cabibbo G, Piscaglia F, et al. Field-practice study of sorafenib therapy for hepatocellular carcinoma: a prospective multicenter study in Italy. Hepatology 2011;54:2055-63. Crossref
    13. Marrero JA, Kudo M, Venook AP, et al. Observational registry of sorafenib use in clinical practice across Child-Pugh subgroups: the GIDEON study. J Hepatol 2016;65:1140-7. Crossref
    14. Hiraoka A, Michitaka K, Kumada T, et al. Validation and potential of albumin–bilirubin grade and prognostication in a nationwide survey of 46,681 hepatocellular carcinoma patients in Japan: the need for a more detailed evaluation of hepatic function. Liver Cancer 2017;6:325-36. Crossref
    15. Yu SC, Hui JW, Hui EP, et al. Unresectable hepatocellular carcinoma: randomized controlled trial of transarterial ethanol ablation versus transcatheter arterial chemoembolization. Radiology 2014;270:607-20. Crossref
    16. Yu SC, Hui JW, Li L, et al. Comparison of chemoembolization, radioembolization, and transarterial ethanol ablation for huge hepatocellular carcinoma (≥10 cm) in tumour response and long-term survival outcome. Cardiovasc Intervent Radiol 2022;45:172-81. Crossref
    17. Cheng AL, Qin S, Ikeda M, et al. Updated efficacy and safety data from IMbrave150: atezolizumab plus bevacizumab vs. sorafenib for unresectable hepatocellular carcinoma. J Hepatol 2022;76:862-73. Crossref
    18. Ren Z, Xu J, Bai Y, et al. Sintilimab plus a bevacizumab biosimilar (IBI305) versus sorafenib in unresectable hepatocellular carcinoma (ORIENT-32): a randomised, open-label, phase 2-3 study. Lancet Oncol 2021;22:977-90. Crossref
    19. Abou-Alfa GK, Chan SL, Kudo M, et al. Phase 3 randomized, open-label, multicenter study of tremelimumab (T) and durvalumab (D) as first-line therapy in patients (pts) with unresectable hepatocellular carcinoma (uHCC): HIMALAYA. J Clin Oncol 2022;40(4_suppl):379. Crossref
    20. Yau T, Park JW, Finn RS, et al. Nivolumab versus sorafenib in advanced hepatocellular carcinoma (CheckMate 459): a randomised, multicentre, open-label, phase 3 trial. Lancet Oncol 2022;23:77-90. Crossref
    21. Kudo M, Finn RS, Qin S, et al. Lenvatinib versus sorafenib in first-line treatment of patients with unresectable hepatocellular carcinoma: a randomised phase 3 non-inferiority trial. Lancet 2018;391:1163-73. Crossref
    22. Ogasawara S, Ooka Y, Koroki K, et al. Switching to systemic therapy after locoregional treatment failure: definition and best timing. Clin Mol Hepatol 2020;26:155-62. Crossref
    23. Kudo M. A novel treatment strategy for patients with intermediate-stage HCC who are not suitable for TACE: upfront systemic therapy followed by curative conversion. Liver Cancer 2021;10:539-44. Crossref
    24. Kudo M. Extremely high objective response rate of lenvatinib: its clinical relevance and changing the treatment paradigm in hepatocellular carcinoma. Liver Cancer 2018;7:215-24. Crossref
    25. Kudo M, Han KH, Ye SL, et al. A changing paradigm for the treatment of intermediate-stage hepatocellular carcinoma: Asia-Pacific Primary Liver Cancer Expert Consensus Statements. Liver Cancer 2020;9:245-60. Crossref
    26. Kudo M, Kawamura Y, Hasegawa K, et al. Management of hepatocellular carcinoma in Japan: JSH Consensus Statements and Recommendations 2021 update. Liver Cancer 2021;10:181-223. Crossref
    27. Hung YW, Lee IC, Chi CT, et al. Redefining tumor burden in patients with intermediate-stage hepatocellular carcinoma: the seven-eleven criteria. Liver Cancer 2021;10:629-40. Crossref
    28. Kim JH, Shim JH, Lee HC, et al. New intermediate-stage subclassification for patients with hepatocellular carcinoma treated with transarterial chemoembolization. Liver Int 2017;37:1861-8. Crossref
    29. Lee IC, Hung YW, Liu CA, et al. A new ALBI-based model to predict survival after transarterial chemoembolization for BCLC stage B hepatocellular carcinoma. Liver Int 2019;39:1704-12. Crossref

    Improving efficiency and effectiveness of workplace-based assessment workshop in postgraduate medical education using a conjoint design

    Hong Kong Med J 2025 Dec;31(6):445–52 | Epub 9 Dec 2025
    © Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
     
    ORIGINAL ARTICLE
    Improving efficiency and effectiveness of workplace-based assessment workshop in postgraduate medical education using a conjoint design
    HY So, FHKAM (Anaesthesiology), MHPE1; Eddy WY Wong, FHKCORL, FRCSEd (ORL)2; Albert KM Chan, FHKCA, MHPE1; George KC Wong, MD, FCSHK1; Jessica YP Law, FHKCOG, MHQS (Harvard)3; PT Chan, FHKCOS, MMEd1; CM Ngai, FHKCORL, FRCS (Edin)2
    1 The Jockey Club Institute for Medical Education and Development, Hong Kong Academy of Medicine, Hong Kong SAR, China
    2 The Hong Kong College of Otorhinolaryngologists, Hong Kong SAR, China
    3 Department of Obstetrics and Gynaecology, Pamela Youde Nethersole Eastern Hospital, Hong Kong SAR, China
     
    Corresponding author: Dr HY So (sohingyu@fellow.hkam.hk)
     
     Full paper in PDF
     
    Abstract
    Introduction: Faculty development for trainers and nurturing feedback literacy in trainees is crucial for effective workplace-based assessments (WBAs) to support trainee competency development. Separate training sessions for trainers and trainees can be challenging when resources are limited. Combined training can optimise resources and foster mutual understanding, although such approaches face challenges related to power dynamics. This study aimed to evaluate the effectiveness of a conjoint WBA workshop in enhancing trainer engagement, improving trainee feedback literacy, and exploring the benefits and challenges of integrating trainers and trainees in a shared learning environment.
     
    Methods: A mixed-methods study was conducted with 13 trainers and five trainees from the Hong Kong College of Otorhinolaryngologists. Quantitative data were collected using the Feedback Literacy Behaviour Scale for trainees and the Continuing Professional Development–Reaction Questionnaire for trainers. Pre- and post-intervention comparisons were analysed using paired t tests. Qualitative data from focus group interviews were thematically analysed.
     
    Results: Quantitative analysis showed statistically significant increases in trainee feedback literacy (P<0.001) and improvements in trainers’ beliefs about capabilities and engagement intentions (P<0.05). The qualitative analysis supported these findings and identified three key factors: mutual understanding, clarification of the WBA purpose, and effective instructional design. Participants valued the mutual understanding fostered in the conjoint setting, which aligned expectations and created a supportive learning environment.
     
    Conclusion: Conjoint WBA workshops may effectively promote trainer engagement and trainee feedback literacy, aligning expectations and fostering a positive feedback culture. Further research is needed to explore the longitudinal impact and applicability to other specialties.
     
     
    New knowledge added by this study
    • Trainers and trainees learning together in the same workplace-based assessment (WBA) workshop facilitates effective mutual learning.
    • Despite potential power dynamics, psychological safety can be maintained in this setting.
    • Collaboration strengthens trainees’ trust in the value of WBA as a tool for learning.
    Implications for clinical practice or policy
    • Conjoint training can be considered an alternative for organising WBA workshops.
    • The Hong Kong Academy of Medicine should support further studies on this design to enhance the effectiveness of WBA workshops.
     
     
    Introduction
    Competency-based medical education (CBME) emphasises the assessment of trainees through direct observation and feedback using workplace-based assessments (WBA).1 These assessments are designed to support continuous learning and competency development through meaningful feedback.2 Effective implementation of WBA requires trainers who are willing and able to provide constructive feedback,3 4 5 and trainees who are motivated to seek and use feedback. This active engagement with feedback is the essence of feedback literacy, defined by Carless and Boud6 as “the understandings, capacities, and dispositions needed to make sense of information and use it to enhance work or learning strategies”. The construct of intention, based on the theory of planned behaviour, highlights that an individual’s willingness to perform a behaviour is influenced by their attitudes, subjective norms, and perceived behavioural control.7 Intention is emphasised as the best predictor of behaviour, especially where constraints or barriers exist. In the context of WBA, focusing on intention helps us understand the underlying motivations and readiness of trainers and trainees to engage in feedback practices. Trainers’ intentions are shaped by their beliefs about the value of feedback, the expectations of peers, and their confidence in their ability to provide that feedback. Dawson et al,8 building on the works of Carless and Boud6 and Molloy et al,9 conceptualised feedback literacy as five key skills: seeking feedback, making sense of information, using feedback, managing emotional responses, and providing feedback. Based on this framework, effective training is essential for fostering engagement and capability in meaningful feedback practices.
     
    Faculty development is often implemented to enhance trainers’ skills, whereas separate sessions aim to build feedback literacy among trainees. However, specialties with small numbers of trainers and trainees face unique challenges in implementing WBA, including limited opportunities to conduct separate training sessions. A conjoint WBA workshop, where both groups train together, may offer an innovative solution to these constraints. Potential benefits include promoting mutual understanding, aligning feedback practices, and fostering a consistent approach to WBA implementation.10 However, concerns regarding power imbalances and psychological safety in mixed-group settings could undermine its effectiveness.11 Thus far, there have been no studies regarding such conjoint workshops; the actual participant experience, including potential advantages and disadvantages, remains unexplored.
     
    Therefore, this study aimed to address the following research questions:
    1. Can conjoint training improve the intention of trainers to participate in WBA?
    2. Can conjoint training improve the feedback literacy of trainees?
    3. What are the experiences of trainers in a conjoint training setting?
    4. What are the experiences of trainees in a conjoint training setting?
     
    Methods
    This study was designed according to the requirements of the SQUIRE-EDU (Standards for QUality Improvement Reporting Excellence in Education) guidelines for educational improvement.12
     
    Study setting
    The study was conducted with trainers and trainees of the Hong Kong College of Otorhinolaryngologists (HKCORL), a specialty college under the Hong Kong Academy of Medicine. The HKCORL is responsible for training and accrediting specialists in otorhinolaryngology, and has been integrating WBAs into its training curriculum since 2021. The College currently has a total of 206 fellows, 57 of whom are trainers. In May 2023, 20 trainers participated in a WBA workshop specifically designed for them. During the first 2 years, basic surgical trainees are under the Hong Kong Intercollegiate Board of Surgical Colleges and rotate through different surgical specialties. Specialist training in otorhinolaryngology takes place only during the 4 years of higher training. Over the past 5 years, the annual intake of higher trainees has ranged from four to 11. Currently, there are 31 higher trainees, 26 of whom participated in a WBA workshop for trainees held in September 2023. Relationships among fellows and trainees are strengthened through regular training courses, academic lectures, workshops, and an annual scientific meeting, complemented by active participation from the Young Fellows Chapter to enhance engagement in College activities. Camaraderie is also fostered through sports activities and social events.
     
    Participant sampling and recruitment
    All participants in the workshop were invited by email to participate in this study on a voluntary basis. All 13 trainers and five trainees enrolled in the workshop volunteered to participate in the study. The cohort of trainers was relatively young; 11 were within 10 years of obtaining their fellowship, and seven had only 1 to 2 years of experience as specialists.
     
    Instructional design
    The 4-hour workshop was designed based on the first principles of instruction, emphasising task-centred learning as the core instructional approach.13 Participants engaged in two authentic learning tasks: procedural-based assessment and case-based discussion, each followed by guided reflection. These tasks provided opportunities to practise giving and receiving feedback, which was the main focus of the workshop.
     
    To prepare for these tasks, participants first completed a pre-course e-learning module consisting of five interactive videos (total duration: 53 minutes). These videos introduced essential concepts, including CBME, self-regulated learning, feedback literacy, and the procedures of WBA. The workshop began with an activity to establish psychological safety, following the recommendations of Rudolph et al,14 ensuring that participants felt comfortable to learn and engage openly. Subsequently, participants’ knowledge was reactivated through interactive lectures and demonstrations, effectively preparing them for the practice activities.
     
    Quantitative measures
    1. Trainee feedback literacy: The Feedback Literacy Behaviour Scale was used to assess changes in trainees’ feedback literacy. It measures five subscales: Seeking Feedback, Making Sense of Feedback, Using Feedback, Providing Feedback, and Managing Affect.8
    2. Trainer engagement in WBA: Trainers’ engagement was measured using the Continuing Professional Development (CPD)–Reaction Questionnaire, based on social cognitive theories (theory of planned behaviour and Triandis’ theory of interpersonal behaviour). It measures intention, social influence, beliefs about capabilities, beliefs about consequences, and moral norms.7 15 16
     
    Both surveys were administered before participants began their e-learning and repeated after completion of the workshop.
     
    Statistical analysis
    Paired t tests were utilised to compare pre- and post-intervention scores for both groups because this method offers more precise estimates of the effect and improved control over confounding variables compared with an unpaired t test, particularly given the small sample size. Descriptive statistics, including means, standard deviations, and Cohen’s d effect sizes, were calculated for each measure using Jamovi (desktop version 2.3.28).17
     
    Qualitative data collection and analysis
    Separate focus group interviews were conducted for trainers and trainees immediately after the workshop, using Cantonese. The two moderators were research staff trained by the authors. Semi-structured interviews were conducted using an interview guide created by the authors (online Appendix). The interviews were audio-recorded, anonymised, and transcribed verbatim. Transcripts were analysed using Braun and Clarke’s thematic analysis approach,18 assisted by ATLAS.ti software (version 8.4.5; ATLAS.ti Scientific Software Development, Berlin, Germany).19
     
    Member checking
    To enhance the credibility of the qualitative findings, results were sent back to participants after thematic analysis to confirm whether they agreed with the interpretation and whether they wished to share additional views. This process helped strengthen the credibility of the qualitative findings.
     
    Reflexivity
    The first author, an intensivist and educationist with a Master’s degree in Health Professions Education, played a key role in designing the conjoint workshop and framing WBA as a learning tool. The second author, a consultant otorhinolaryngologist and CBME advocate, proposed the joint training concept to address challenges in organising separate trainer and trainee sessions. Support from the seventh author, president of HKCORL, was critical for workshop implementation. Other authors contributed diverse clinical and educational expertise: the third author, a consultant anaesthetist and faculty development chair of the Jockey Club Institute for Medical Education and Development of the Hong Kong Academy of Medicine; the fifth author, an obstetrics and gynaecology consultant with expertise in healthcare quality and simulation; the sixth author, an orthopaedic surgeon and former college censor; and the fourth author, a neurosurgeon experienced in WBA workshops.
     
    Their collective advocacy for CBME and WBA informed the study design and interpretation. While offering rich, multifaceted insights into WBA, this commitment may have influenced the emphasis on the conjoint workshop’s benefits, shaping research questions and conclusions accordingly.
     
    Results
    Quantitative findings
    Among the trainees, the total Feedback Literacy Score significantly increased (pre=96.8 ± 4.04, post=125.2 ± 9.93; P<0.001), associated with a large effect size (d= –3.488). There was no statistically significant difference in the subscales of the Feedback Literacy Score (Table 1).
     

    Table 1. Trainee feedback literacy scores
     
    Among the trainers, the CPD–Reaction Scores showed statistically significant improvement in intention (pre=10.27 ± 1.65, post=11.09 ± 1.88; P=0.036), beliefs about capabilities (pre=15.55 ± 2.01, post=16.73 ± 2.25; P=0.015), beliefs about consequences (pre=10.27 ± 1.65, post=11.45 ± 1.88; P=0.049), and total score (pre=60.18 ± 5.04, post=65.82 ± 5.93; P=0.008). The effect sizes were moderate to large for intention (d= –0.750), moderate for beliefs about capabilities (d= –0.543) and beliefs about consequences (d= –0.631), and large for the total score (d= –0.801) [Table 2].
     

    Table 2. Trainer Continuing Professional Development–Reaction Scores
     
    Qualitative findings
    Trainee focus group analysis
    Four themes were identified: understanding WBA assessment, enhancing feedback literacy, presence of trainers in the workshop, and workshop design and delivery. Subthemes and quotations under each theme are listed in online supplementary Table 1.
     
    Trainer focus group analysis
    Four themes were identified: perceptions of WBA, improvement in feedback skills, presence of trainees in the workshop, and workshop design and delivery. Subthemes and quotations under each theme are listed in online supplementary Table 2.
     
    Discussion
    This mixed-methods study evaluated the impact of a conjoint WBA workshop designed to enhance both trainer intention to participate in WBA and trainee feedback literacy. The quantitative and qualitative data converged to show that the conjoint workshop improved trainer intention and appreciation of feedback skills; it also enhanced trainee feedback literacy and confidence in managing feedback during their learning process. Specifically, the quantitative results showed statistically significant improvement in trainer intention to participate in WBA as measured by the CPD–Reaction Questionnaire, and in trainee feedback literacy as measured by the Feedback Literacy Behaviour Score. Moreover, the qualitative findings suggested that trainers appreciated the use of open-ended questions and integration of feedback into micro-moments as valuable strategies, whereas trainees reported increased confidence in managing feedback and constructively applying it to their learning processes.
     
    Through analysis of the qualitative data, we also identified three key factors that contributed to these findings: mutual understanding between trainers and trainees, clarification of the purpose of WBA, and effective instructional design.
     
    Mutual understanding between trainers and trainees
    A key finding of this study was the positive reception of the mixed-group learning experience. Both trainers and trainees valued the opportunity to directly engage with each other, which fostered mutual understanding of the assessment process and reduced discrepancies in feedback practices. Notably, the absence of prominent power dynamics was striking. This may be partially attributed to the relatively young cohort of trainers, which likely fostered a more collaborative atmosphere. Although previous literature suggests that hierarchical structures can hinder open communication in feedback settings,11 the present study demonstrated that in contexts with flatter hierarchies, conjoint workshops can be highly effective. Trainees indicated that the emphasis on psychological safety during the workshop helped prepare them for meaningful participation. Adherence to the recommendations of Rudolph et al14 to establish a safe environment likely contributed to this positive outcome. The close relationships already present between trainers and trainees within this small specialty could also have contributed. Existing literature supports the importance of trainer–trainee relationships in WBA.4 20 Interactions within this psychologically safe environment facilitated a more unified understanding of assessment standards and expectations, which helped minimise discrepancies in feedback practices. This alignment fostered trust that both trainers and trainees were working towards the shared goal of using WBA for learning purposes.
     
    Our qualitative findings indicated that both groups reported a highly positive experience. The distinction lay in the focus: trainees emphasised gains in feedback literacy and confidence, whereas trainers valued new practical strategies and enhanced mutual understanding. According to the conceptual model of Castanelli et al,21 the level of trust in supervisors influences trainees’ perceptions of WBA. When trust is low, WBAs are regarded as performance evaluations, leading trainees to adopt risk-minimising strategies.22 Conversely, when trust is high, trainees perceive WBA as an assessment for learning, making them more willing to embrace vulnerability. Our findings suggest that, with appropriate measures to ensure psychological safety, a combined workshop setting may help align expectations, create a shared understanding of WBA practices, and strengthen trainees’ trust in their trainers.
     
    Clarification of the purpose of workplace-based assessment
    Both trainers and trainees recognised that WBA serves as a formative tool that guides reflective practice and enhances clinical competence. This understanding is crucial because it aligns with the principles of adult learning, particularly the notion that adults are self-directed learners who take responsibility for their own education.23 When both trainers and trainees appreciate that WBA facilitates reflective practice, they engage in self-directed learning by utilising feedback to critically analyse their clinical performance. This process empowers them to identify areas for improvement and take actionable steps towards enhancing their skills. Moreover, adults are motivated to learn when the material is directly relevant to their professional needs.23 In this context, WBA’s role in guiding clinical competence is highly pertinent because it connects seamlessly with daily practice. Thus, WBA not only fosters a culture of continuous improvement but also effectively motivates adult learners by linking assessment to professional development. However, motivation alone is insufficient. Participants also noted barriers such as time constraints in the clinical setting and the need for effective evaluation of outcomes. These issues must be addressed to ensure that motivation remains long-lasting and that trainees continue to meaningfully engage with WBAs in their everyday practice.
     
    Effective instructional design
    The workshop was designed based on the first principles of instruction, an evidence-based model that emphasises moving beyond memorisation to active knowledge application through real-world tasks.13 24 This approach encourages learners to engage in practice, which is often challenging and requires specific support. To address this, support is twofold: cognitive and affective. Cognitive support helps learners understand key concepts through pre-course e-learning, reactivation of prior knowledge, demonstration, and facilitated reflection.13 Affective support focuses on ensuring psychological safety, which is crucial for effective engagement in practice.14 While overall improvement reflects the combined effect of e-learning and the workshop, the qualitative data indicate that the interactive, conjoint nature of the workshop itself was the primary catalyst for enhancing mutual understanding and feedback skills. Our analysis revealed that participants valued this design and highlighted two additional elements that supported their learning: cognitive aids and peer feedback.
     
    During the course, we used cognitive aids to remind participants of this six-step framework (Fig), and they found the use of such a framework effective. Workplace-based assessments consist of recurrent constituent skills—the steps to follow—and non-recurrent constituent skills (eg, how to respond in the debriefing conversation). The use of a structured framework and just-in-time information, such as cognitive aids, has been shown to effectively support the learning of recurrent skills.25
     

    Figure. Cognitive aid: the six-steps of workplace-based assessments
     
    During the guided reflection, we also engaged participants in peer feedback. Our analysis showed that participants found this practice enhanced their learning. Peer feedback enhances metacognitive perceptions by encouraging learners to reflect on their understanding and performance in relation to their peers. This fosters self-awareness as learners evaluate their work against others’, facilitating deeper insights into strengths and areas for improvement.26 There is evidence demonstrating the effectiveness of peer feedback in enhancing feedback literacy.27 28
     
    Nonetheless, participants noted that the workshop could be improved by providing clearer instructions for role-playing exercises and using more medical-related cases for demonstration. Effective instruction is important. According to cognitive load theory, ineffective guidance can increase extrinsic cognitive load and impair learning, especially when the task itself is already demanding.29 We used a movie-based scenario not related to medicine to make the activity fun and interesting. However, the participants’ comment is valid, considering evidence that similarity between demonstration and practice is crucial for effective learning. When demonstrations closely resemble real-life applications, learners can better understand and apply concepts. This alignment enhances procedural knowledge, enabling learners to transition from observation to imitation and, eventually, autonomous practice. Furthermore, relevant demonstrations foster engagement and allow immediate feedback, which reinforces learning.30 31 Future workshops should focus on improving these aspects for better learning outcomes.
     
    Limitations and future directions
    This study had some limitations. The quantitative findings are constrained by the small sample size, particularly among trainees (n=5), which limits statistical power. Furthermore, although participation in the workshop was encouraged by the College, the sample may still reflect a group more engaged in training initiatives, potentially affecting generalisability. While the qualitative data provided rich insights into participants’ experiences, a larger cohort could offer a broader understanding of the impact of this educational intervention. Additionally, the study did not assess long-term changes in behaviour or practice, which are needed to determine sustained effects of the conjoint training on WBA implementation. Future studies could explore the longitudinal impact of such workshops and investigate their applicability in larger specialties where power dynamics might differ. It would also be valuable to assess the scalability of conjoint workshops in different contexts, particularly those with more complex hierarchical structures, to better understand their potential for broader implementation.
     
    Conclusion
    This study provides evidence that conjoint WBA workshops for trainers and trainees may effectively enhance trainee feedback literacy and trainer engagement in CBME. The mixed-group learning experience promoted mutual understanding and aligned feedback practices without creating significant power imbalances, fostering positive trainer–trainee interactions and enhancing trust, provided measures are taken to ensure psychological safety. Despite the positive outcomes, the study’s limitations, including its small sample size and lack of long-term follow-up, should be considered. Future research could explore the longitudinal impact of conjoint workshops and their applicability in larger specialties with more complex power dynamics.
     
    Author contributions
    Concept or design: HY So, EWY Wong.
    Acquisition of data: HY So, CM Ngai.
    Analysis or interpretation of data: HY So, AKM Chan, GKC Wong.
    Drafting of the manuscript: HY So.
    Critical revision of the manuscript for important intellectual content: All authors.
     
    All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
     
    Conflicts of interest
    All authors have disclosed no conflicts of interest.
     
    Acknowledgement
    The authors thank Mr CF Chan and Ms Cathy Ma of the Jockey Club Institute for Medical Education and Development of Hong Kong Academy of Medicine for valuable assistance in moderating the focus group discussions and preparing the transcripts. The authors also appreciate the logistical support provided by Ms Cindy Leung of The Hong Kong College of Otorhinolaryngologists, as well as Mr CF Chan, Ms Cathy Ma, and Ms Jojo Lee of the Jockey Club Institute for Medical Education and Development of Hong Kong Academy of Medicine in organising the workshop. Additionally, the authors wish to express their heartfelt thanks to Professor Jack Pun from the Department of English at The Chinese University of Hong Kong and Professor Stanley Sau-ching Wong from the Department of Anaesthesiology at The University of Hong Kong for insightful contributions to the preparation of the manuscript.
     
    Declaration
    Findings from this study were presented at AMEE 2025 of the International Association for Health Professions Education, 23-27 August 2025, Barcelona, Spain.
     
    Funding/support
    This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
     
    Ethics approval
    This research was approved by the Survey and Behavioural Research Ethics Committee of The Chinese University of Hong Kong, Hong Kong (Ref No.: SBRE-23-0855). Information sheets regarding the study were provided to all participants, and signed consent was obtained from each participant prior to the study
     
    Supplementary material
    The supplementary material was provided by the authors and some information may not have been peer reviewed. Accepted supplementary material will be published as submitted by the authors, without any editing or formatting. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by the Hong Kong Academy of Medicine and the Hong Kong Medical Association. The Hong Kong Academy of Medicine and the Hong Kong Medical Association disclaim all liability and responsibility arising from any reliance placed on the content.
     
    References
    1. So HY, Choi YF, Chan PT, Chan AK, Ng GW, Wong GK. Workplace-based assessments: what, why, and how to implement? Hong Kong Med J 2024;30:250-4. Crossref
    2. Holmboe ES, Sherbino J, Long DM, Swing SR, Frank JR. The role of assessment in competency-based medical education. Med Teach 2010;32:676-82. Crossref
    3. Anderson HL, Kurtz J, West DC. Implementation and use of workplace-based assessment in clinical learning environments: a scoping review. Acad Med 2021;96:S164-74. Crossref
    4. Massie J, Ali JM. Workplace-based assessment: a review of user perceptions and strategies to address the identified shortcomings. Adv Heal Sci Educ Theory Pract 2016;21:455-73. Crossref
    5. Lörwald AC, Lahner FM, Mooser B, et al. Influences on the implementation of Mini-CEX and DOPS for postgraduate medical trainees’ learning: a grounded theory study. Med Teach 2019;41:448-56. Crossref
    6. Carless D, Boud D. The development of student feedback literacy: enabling uptake of feedback. Assess & Eval High Educ 2018;43:1315-25. Crossref
    7. Ajzen I. The theory of planned behaviour. Organ Behav Hum Decis Processes 1991;50:179-211. Crossref
    8. Dawson P, Yan Z, Lipnevich A, Tai J, Boud D, Mahoney P. Measuring what learners do in feedback: the Feedback Literacy Behaviour Scale. Assess Eval High Educ 2023;49:348-62. Crossref
    9. Molloy E, Boud D, Henderson M. Developing a learning-centred framework for feedback literacy. Assess Eval High Educ 2020;45:527-40. Crossref
    10. Illingworth P, Chelvanayagam S. Benefits of interprofessional education in health care. Br J Nurs 2007;16:121-4. Crossref
    11. Brooks AK. Power and the production of knowledge: collective team learning in work organizations. Hum Resour Dev Q 1994;5:213-35. Crossref
    12. Ogrinc G, Armstrong GE, Dolansky MA, Singh MK, Davies L. SQUIRE-EDU (Standards for QUality Improvement Reporting Excellence in Education): publication guidelines for educational improvement. Acad Med 2019;94:1461-70. Crossref
    13. Merrill MD. First principles of instruction. In: Reigeluth CM, Carr-Chellman AA, editors. Instructional Design Theories and Models: Building a Common Knowledge Base. Vol III. New York: Routledge Publishers; 2009: 43-59.
    14. Ruldolph JW, Raemer DB, Simon R. Establishing a safe container for learning in simulation: the role of the presimulation briefing. Simul Healthc 2014;9:339-49. Crossref
    15. Légaré F, Borduas F, Freitas A, et al. Development of a simple 12-item theory-based instrument to assess the impact of continuing professional development on clinical behavioral intentions. PLoS One 2014;9:e91013. Crossref
    16. Triandis HC. Values, attitudes, and interpersonal behaviour. In: Howe HE Jr, Page MM, editors. Nebraska Symposium on Motivation. Lincoln: University of Nebraska Press; 1979: 195-259.
    17. Jamovi Project. Jamovi (desktop version 2.3.28 for Mac). 2024. Available from: https://dev.jamovi.org. Accessed 25 Oct 2024.
    18. Clarke V, Braun V. Thematic analysis. In Teo T, editor. Encyclopedia of Critical Psychology. New York: Springer; 2014: 1947-52. Crossref
    19. ATLAS.ti Scientific Software Development GmbH. ATLAS.ti (software version 8.4.5). 2024. Available from: https://atlasti.com. Accessed 25 Oct 2024.
    20. Baboolal SO, Singaram VS. Specialist training: workplace-based assessments impact on teaching, learning and feedback to support competency-based postgraduate programs. BMC Med Educ 2023;23:941. Crossref
    21. Castanelli DJ, Weller JM, Molloy E, Bearman M. Trust, power and learning in workplace-based assessment: the trainee perspective. Med Educ 2022;56:280-91. Crossref
    22. Gaunt A, Patel A, Rusius V, Royle TJ, Markham DH, Pawlikowska T. ‘Playing the game’: how do surgical trainees seek feedback using workplace-based assessment? Med Educ 2017;51:953-62. Crossref
    23. Knowles MS, Holton EF III, Swanson RA. The Adult Learner: The Definitive Classic in Adult Education and Human Resource Development, 6th ed. Amsterdam: Elsevier; 2005.
    24. Francom GM, Gardner J. What is task-centered learning? TechTrends 2014;58:27-35. Crossref
    25. van Merriënboer JJ, Kirschner PA. Ten Steps to Complex Learning: A Systematic Approach to Four-Component Instructional Design. 3rd ed. New York: Routledge Publisher; 2018. Crossref
    26. Lerchenfeldt S, Kamel-ElSayed S, Patino G, Loftus S, Thomas DM. A qualitative analysis on the effectiveness of peer feedback in team-based learning. Med Sci Educ 2023;33:893-902. Crossref
    27. Man D, Kong B, Chau MH. Developing student feedback literacy through peer review training. RELC J 2024;55:408-21. Crossref
    28. Little T, Dawson P, Boud D, Tai J. Can students’ feedback literacy be improved? A scoping review of interventions. Assess Eval High Educ 2023;49:39-52. Crossref
    29. van Merriënboer JJ, Sweller J. Cognitive load theory in health professional education: design principles and strategies. Med Educ 2010;44:85-93. Crossref
    30. McLain M. Developing perspectives on ‘the demonstration’ as a signature pedagogy in design and technology education. Int J Tech Design Educ 2021;31:3-26. Crossref
    31. Grossman R, Salas E, Pavlas D, Rosen MA. Using instructional features to enhance demonstration-based training in management education. Acad Manag Learn Educ 2012;12:219-43. Crossref

    Incidence, risk factors, and clinical outcomes of peripartum cardiomyopathy in Hong Kong

    Hong Kong Med J 2025 Dec;31(6):434–44 | Epub 27 Nov 2025
    © Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
     
    ORIGINAL ARTICLE  CME
    Incidence, risk factors, and clinical outcomes of peripartum cardiomyopathy in Hong Kong
    Liliana SK Law, MB, ChB1; LT Kwong, MB, BS1; KH Siong, MB, BS1; Sani TK Wong, MB, ChB2; WL Chan, MB, ChB3; KY Tse, MB, BS4; Yannie YY Chan, MB, BS5; KS Eu, MB, BS6; CY Chow, MB, ChB7; Joan KO Wai, LMCHK8; HC Mok, MB, BS1; PL So, MB, BS1
    1 Department of Obstetrics and Gynaecology, Tuen Mun Hospital, Hong Kong SAR, China
    2 Department of Obstetrics and Gynaecology, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, China
    3 Department of Obstetrics and Gynaecology, Kwong Wah Hospital, Hong Kong SAR, China
    4 Department of Obstetrics and Gynaecology, Queen Elizabeth Hospital, Hong Kong SAR, China
    5 Department of Obstetrics and Gynaecology, Princess Margaret Hospital, Hong Kong SAR, China
    6 Department of Obstetrics and Gynaecology, Pamela Youde Nethersole Eastern Hospital, Hong Kong SAR, China
    7 Department of Obstetrics and Gynaecology, United Christian Hospital, Hong Kong SAR, China
    8 Department of Obstetrics and Gynaecology, Queen Mary Hospital, The University of Hong Kong, Hong Kong SAR, China
     
    Corresponding author: Dr Liliana SK Law (lawskliliana@gmail.com)
     
     Full paper in PDF
     
    Abstract
    Introduction: Peripartum cardiomyopathy (PPCM) is an uncommon but serious form of heart failure affecting women during late pregnancy or early postpartum. This territory-wide multicentre retrospective study aimed to evaluate the local incidence, risk factors, and clinical outcomes, including subsequent pregnancies, in Hong Kong.
     
    Methods: Medical records were retrospectively reviewed for women who delivered at all public hospitals between 1 January 2013 and 31 December 2022 and met the 2010 European Society of Cardiology Working Group criteria for PPCM. Regression analysis was performed to investigate maternal risk factors.
     
    Results: Thirty Asian women were diagnosed with PPCM, corresponding to an incidence of 1 in 11 179 live births. Eleven (36.7%) had antepartum onset of symptoms, and 25 (83.3%) were diagnosed after childbirth, most presenting with severe symptoms (90%). The median left ventricular ejection fraction was 30% (range, 10%-44%). Notable complications included cardiogenic shock (10%), respiratory failure (23.3%), acute renal failure (23.3%), and thromboembolism (23.3%). Most women received guideline-directed heart failure therapy. At 12 months, all-cause mortality was 6.7%, and cardiac recovery occurred in 60%. Eleven women had 13 subsequent pregnancies (three miscarriages, five terminations, and five live births). There were no maternal deaths or cases of recurrent PPCM. Genetic testing identified potentially pathogenic variants in at least 10% of women. Antenatal anaemia (adjusted odds ratio [OR]=13.04; 95% confidence interval [95% CI]=3.72-45.70) and hypertensive disorders of pregnancy (adjusted OR=38.00; 95% CI=9.66-149.52) were associated with higher odds of PPCM.
     
    Conclusion: This study highlights the substantial morbidity and mortality associated with PPCM. Genetic testing may aid in risk stratification and prognostication.
     
     
    New knowledge added by this study
    • Peripartum cardiomyopathy (PPCM) is an uncommon but potentially fatal disease in Hong Kong.
    • Genetic testing by next-generation sequencing identified 10% of women with PPCM as carriers of potential genetic variants associated with cardiomyopathy.
    • Antenatal anaemia and hypertensive disorders of pregnancy are independent clinical risk factors for PPCM.
    Implications for clinical practice or policy
    • Screening for and prevention of anaemia during pregnancy and pre-eclampsia may help reduce the incidence of PPCM.
    • The integration of genetic testing in PPCM management may support personalised medical care.
     
     
    Introduction
    Peripartum cardiomyopathy (PPCM) is a rare form of heart failure that occurs in relation to pregnancy, resulting in substantial morbidity and mortality.1 In 2010, the Heart Failure Association of the European Society of Cardiology (ESC) defined PPCM as “an idiopathic cardiomyopathy presenting with heart failure secondary to left ventricular systolic dysfunction towards the end of pregnancy or in the months following delivery, where no other cause of heart failure is found”.2 Globally, its incidence varies widely, ranging from 1 in 100 live births in Nigeria3 to 1 in 20 000 live births in Japan.4
     
    The exact pathogenesis of PPCM is not yet fully understood; the current hypothesis proposes a ‘two-hit’ model involving an initial vascular insult caused by vasculotoxic hormonal effects, including soluble FMS-like tyrosine kinase-1 and prolactin, followed by a second hit of underlying predisposition—such as genetic susceptibility and other risk factors—that limits some women’s ability to withstand this vasculotoxic insult.1 Genetic or familial predisposition to PPCM has been supported by multiple reports.5 6 7 8 Additionally, well-recognised risk factors for PPCM include advanced maternal age, African American ancestry, multiple pregnancies, hypertension, and pre-eclampsia.9
     
    Peripartum cardiomyopathy is a potentially life-threatening myocardial disease that affects women of all ethnic groups10 and can have long-term health consequences.11 Until now, there has been a lack of information regarding the clinical phenotype and outcomes of this disease in Hong Kong. The present population-based study was conducted to evaluate the local incidence, clinical presentation, management, complications, 12-month outcomes, and subsequent pregnancies in women with PPCM. Additionally, we examined potential risk factors by comparing the clinical characteristics of women with and without PPCM to provide a basis for future preventive strategies.
     
    Methods
    Study design
    This was a population-based retrospective study of all women with PPCM who delivered in public hospitals in Hong Kong between 1 January 2013 and 31 December 2022. Cases were identified through the Clinical Data Analysis and Reporting System, which captures obstetric data and hospitalisation diagnoses from eight public hospitals providing obstetric services. First, all women who delivered during the study period and had a diagnosis code for heart failure from the third trimester to 6 months postpartum were identified. Each woman’s medical record was systematically reviewed by two authors to determine whether the following criteria for PPCM were met: development of cardiac failure (with left ventricular ejection fraction [LVEF] <45% on echocardiography) during the third trimester or within 6 months postpartum without an identifiable cause. Women were excluded if LVEF was ≥45%, a recognised cause of heart failure was identified, or there was no physician-confirmed diagnosis of PPCM.
     
    Clinical variable collection
    Baseline characteristics (including socio-demographics, preexisting health conditions, and obstetric history) at the time of PPCM diagnosis were obtained from medical records. Clinical presentation and initial investigations, including electrocardiography, chest radiography, echocardiography, and laboratory results, were collected. All in-hospital complications and reported outcomes during follow-up were recorded, including all-cause mortality and cardiac recovery determined by echocardiography at 12 months. Management strategies were documented, including admission to the intensive care unit or cardiac care unit, use of mechanical ventilation or circulatory support, medications prescribed at hospital discharge, pacemaker insertion, and heart transplantation. Complete recovery of cardiac function was defined as LVEF ≥50%. Some patients underwent genetic evaluation, and their reports were analysed.
     
    Obstetric outcomes at the time of the PPCM event were assessed, including hypertensive disorders of pregnancy; gestational diabetes; thyroid disease; antenatal anaemia (defined as a haemoglobin level <10.5 g/dL); use of tocolytics; placenta accreta spectrum; placental abruption; fetal growth restriction; preterm delivery; assisted vaginal delivery or caesarean section; primary postpartum haemorrhage (blood loss ≥500 mL); and caesarean hysterectomy. Neonatal outcomes were examined, including stillbirth, sex, birth weight, small for gestational age, Apgar scores, admission to the neonatal intensive care unit, and death within 28 days of life. Data from the territory-wide electronic healthcare database were also extracted regarding outcomes of subsequent pregnancies, including LVEF before, during, and after pregnancy. The interval between the PPCM pregnancy and the first subsequent pregnancy was recorded.
     
    To investigate risk factors for PPCM, women who gave birth during the same period but did not develop heart failure were selected as the control group, with a PPCM-to-control ratio of 1:4. Demographic and clinical characteristics were compared between women with and without PPCM.
     
    Statistical analysis
    Data analysis was conducted using SPSS (Windows version 26.0; IBM Corp, Armonk [NY], United States). The incidence rate was calculated by dividing the total number of PPCM cases by the total number of live births during the study period. Descriptive data for continuous variables were presented as mean ± standard deviation or median (range or interquartile range), and categorical data were presented as numbers with percentages. Comparisons between women with and without PPCM were performed using Student’s t test or the Mann-Whitney U test for continuous variables, and the Chi squared test or Fisher’s exact test for categorical variables. Risk factors associated with PPCM were assessed using univariable and multivariable logistic regression analyses, with results expressed as odds ratios (ORs) and 95% confidence intervals (95% CIs). A P value of <0.05 was considered statistically significant. The STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines were followed in the preparation of this article.
     
    Results
    Incidence of peripartum cardiomyopathy in Hong Kong
    During the 10-year study period, 30 women with PPCM delivered in public hospitals (Fig 1). Over the same period, there were 335 376 live births, yielding an estimated PPCM incidence of 1 in 11 179 live births in Hong Kong.
     

    Figure 1. Identification of study population
     
    Demographics, clinical characteristics, and investigations
    Detailed characteristics are listed in Table 1. All women in this study were Asian. The mean age was 33.5 years and the median body mass index was 22.0 kg/m2. One woman had a positive family history of heart failure of unknown cause; no women had a previous history of PPCM or cardiac disease.
     

    Table 1. Maternal socio-demographic characteristics, medical history, and obstetric history (n=30)
     
    Symptoms began antepartum in 36.7% of women and postpartum in 63.3%; PPCM was predominantly diagnosed postpartum (83.3%). The median time from symptom onset to diagnosis was 3.5 days (range, 0-107). At diagnosis, 90% of women had severe symptoms (New York Heart Association functional class III/IV), most commonly comprising shortness of breath, peripheral oedema, and desaturation. Common electrocardiographic findings included sinus tachycardia and prolonged QTc interval. At the first echocardiographic assessment, the median LVEF was 30% (range, 10-44). More than half of the women had abnormal chest radiographs showing congestive lung fields, cardiomegaly, and pleural effusion (Table 2).
     

    Table 2. Clinical presentation and investigations (n=30)
     
    Complications, management, and cardiac recovery
    Detailed results are presented in Table 3. Of the 30 women with PPCM, 19 (63.3%) were managed in the intensive care unit or cardiac care unit. Cardiogenic shock, respiratory failure, and acute renal failure occurred in 10% to 20% of cases. Inotropic support, mechanical ventilation, extracorporeal membrane oxygenation, and renal replacement therapy were used during acute treatment.
     

    Table 3. Management, complications, and cardiac recovery during hospitalisation and follow-up (n=30)
     
    At hospital discharge, most women were prescribed angiotensin-converting enzyme inhibitors (ACEis) or angiotensin receptor blockers (ARBs) and beta-blockers. Four women received prophylactic low–molecular-weight heparin for venous thromboembolism prevention after the event; another four required warfarin for the treatment of cerebral venous thrombosis, brachial artery thromboembolism, pulmonary embolism, or deep vein thrombosis (Table 3).
     
    One woman experienced decompensated heart failure requiring an intra-aortic balloon pump and a left ventricular assist device 9 months after diagnosis, followed by heart transplantation 1 year after the event. Two women underwent implantable cardioverter-defibrillator insertion due to symptomatic premature ventricular contractions and poor LVEF recovery. Seven women (23.3%) experienced nine thromboembolic events within 1 year of the PPCM episode, including left ventricular thrombi, ischaemic stroke, and pulmonary embolism. The median follow-up duration after PPCM was 47 months (range, 3-140). At 12 months, all-cause in-hospital mortality was 6.7%; causes of death were myocardial infarction and pulmonary embolism. Overall, recovery of left ventricular function (LVEF ≥50%) occurred in 60% of women (Table 3).
     
    Antenatal co-morbidities, obstetric outcomes, and neonatal outcomes
    Prior to PPCM, 80% of women received antenatal care. Four women (13.3%) had twin pregnancies. Antenatal anaemia was present in 50% of women. Hypertensive disorders of pregnancy occurred in 56.7%, whereas gestational diabetes was noted in 13.3%. Complications related to pre-eclampsia included haemolysis, elevated liver enzymes, and low platelets syndrome in 3.3%; eclampsia in 3.3%; and placental abruption in 6.7%. No women received tocolytics during pregnancy. The median gestational age at delivery was 37 weeks (range, 28-41). The caesarean section rate was 53.3%, and the most frequent indication was unstable maternal condition (31.3%). Primary postpartum haemorrhage occurred in 30% of cases; one woman required hysterectomy for placenta accreta spectrum. Among the 34 newborns, 32 (94.1%) were born alive; two were stillborn in the third trimester (5.9%) due to placental abruption and trisomy 18. The median birth weight was 2745 g, and 11.8% of newborns were small for gestational age. Four newborns (11.8%) had an Apgar score below 7 at 5 minutes, and nine (26.5%) required admission to a neonatal intensive care unit. There were no cases of early neonatal death (Table 4).
     

    Table 4. Antenatal co-morbidities, obstetric outcomes, and neonatal outcomes
     
    Outcomes of subsequent pregnancies
    The obstetric and cardiac outcomes of the 11 women with subsequent pregnancies are shown in Figure 2. The median interval between the PPCM-affected pregnancy and the next pregnancy was 17 months (range, 4-60). There were 13 subsequent pregnancies (three miscarriages, five terminations, and five live births). Of the five terminations, two were advised due to poor cardiac condition; the remaining three were elective for maternal anxiety or social reasons. There were no maternal deaths or cases of recurrent PPCM.
     

    Figure 2. Obstetric and cardiac outcomes of subsequent pregnancies
     
    Cases with genetic testing
    Genetic analysis using a dilated cardiomyopathy (DCM) panel by next-generation sequencing was requested by physicians in three cases (online supplementary Table 1). Case 1, involving a woman with a family history of heart failure, revealed a pathogenic variant in the FLNC gene. Case 2, concerning a patient with a history of cancer-related chemotherapy who developed refractory postpartum heart failure requiring heart transplantation 1 year after PPCM diagnosis, had no prior signs of heart failure before pregnancy. A genetic test identified two pathogenic variants in the TTN and MYBPC3 genes. Case 3 involved a woman with chronic kidney disease who exhibited persistent left ventricular systolic dysfunction 4 years after PPCM diagnosis. Genetic evaluation was pursued due to her young-onset multisystem disease, revealing a variant in the NEXN gene. This variant, associated with autosomal dominant monogenic DCM, was absent from population databases but showed conflicting results on in silico prediction algorithms; therefore, it was classified as a variant of uncertain significance. Overall, potentially pathogenic genetic variants were identified in at least 10% of women with PPCM.
     
    Maternal factors associated with peripartum cardiomyopathy
    Compared with the control group, univariable logistic regression analysis showed that factors associated with PPCM included advanced maternal age (≥40 years), smoking, hypertensive disorders of pregnancy, and antenatal anaemia. In multivariable regression analysis, PPCM was independently associated with hypertensive disorders of pregnancy (adjusted OR=38.00; 95% CI=9.66-149.52; P<0.001) and antenatal anaemia (adjusted OR=13.04; 95% CI=3.72-45.70; P<0.001) [online supplementary Table 2].
     
    Discussion
    Time from symptom onset to diagnosis
    Over the 10-year study period, we observed a PPCM incidence of 1 in 11 179 live births in Hong Kong. Worldwide variation in PPCM incidence may relate to ethnic and socio-economic factors12; rates are expected to increase because of advancing maternal age,13 multiple pregnancies, and obesity. About one-third of our patients developed symptoms before delivery, a finding comparable to the Asia-Pacific group in the ESC EURObservational Research Programme registry.10 Overall, 30% of women were diagnosed more than 7 days after symptom onset. Among those with antepartum-onset symptoms, 54.5% were diagnosed after delivery. This diagnostic delay may be attributed to the difficulty in distinguishing PPCM from normal physiological changes of pregnancy—its symptoms often mimic those of late gestation and may only be recognised postpartum when they become more pronounced. Delayed diagnosis has been associated with lower rates of left ventricular recovery.14 Early recognition and awareness among both pregnant women and healthcare professionals are crucial to enable prompt initiation of heart failure therapy, which may improve cardiac recovery. To support early detection and facilitate timely specialist referral for diagnostic evaluation, serum biomarkers can be measured to rule out heart failure with high probability during pregnancy or the postpartum period.15
     
    Pre-eclampsia and peripartum cardiomyopathy
    In our study, approximately half of the cases involved pre-eclampsia, a finding consistent with the Asia-Pacific cohort in the ESC EURObservational Research Programme registry.10 A meta-analysis of 22 studies demonstrated a fourfold higher prevalence of pre-eclampsia among women with PPCM relative to the general obstetric population (22% vs 5%).16 Our multivariable regression analysis confirmed that hypertensive disorders of pregnancy constituted an independent risk factor for PPCM. The association between pre-eclampsia and PPCM may be explained by their shared pathophysiological mechanism—systemic vascular angiogenic imbalance.1 15 17 Preeclampsia and PPCM might represent a single disease spectrum with substantial overlap.17 Low-dose aspirin is generally used for the prevention of pre-eclampsia and its associated morbidity and mortality.18 Although aspirin use for PPCM prevention is not supported by evidence-based guidelines, it could theoretically provide benefit due to the shared vascular dysfunction pathways. Consequently, the use of aspirin for pre-eclampsia prevention may indirectly reduce the risk of PPCM in high-risk women.
     
    Anaemia and peripartum cardiomyopathy
    We found that antenatal anaemia was independently associated with PPCM. A systematic review and meta-analysis previously indicated that women with anaemia had up to fivefold higher odds of developing PPCM compared with women exhibiting normal haemoglobin levels.19 The precise nature of this association remains unclear; iron deficiency may contribute by impairing myocardial contractile function.20 Anaemia screening and correction during pregnancy may help reduce the risk of PPCM.
     
    Management of peripartum cardiomyopathy
    A multidisciplinary approach involving cardiologists, obstetricians, intensivists, cardiac surgeons, anaesthesiologists, neonatologists, and nurses is essential for the management of PPCM.21 In severe cases with haemodynamic instability, acute management—including immediate resuscitation and mechanical respiratory or circulatory support—may be required.15 Urgent caesarean section should be considered for advanced heart failure that persists despite optimal medical therapy. According to international consensus, the main treatment should follow guideline-directed medical therapy for heart failure with reduced ejection fraction in non-pregnant patients, while respecting contraindications for certain drugs during pregnancy.6 22 23 24 25 Standard therapies include diuretics, ACEis or ARBs, mineralocorticoid receptor antagonists, vasodilators (hydralazine/nitrates), digoxin, beta-blockers, and anticoagulants. A 2022 meta-analysis of global data demonstrated that frequent prescription of beta-blockers, ACEis/ARBs, and bromocriptine or cabergoline was associated with lower all-cause mortality and better left ventricular recovery at 12 months.26 In our study, most patients received ACEis/ARBs and beta-blockers; fewer were prescribed bromocriptine at discharge. The rationale for using dopamine agonists to inhibit prolactin secretion lies in the proposed pathophysiological mechanism involving 16-kDa prolactin, an oxidative stress-mediated cleavage product that damages cardiovascular tissue.27 Regarding prolactin inhibition in women with PPCM, a meta-analysis reported that those treated with bromocriptine had higher odds of left ventricular recovery, without a significant difference in all-cause mortality.28 However, bromocriptine use is associated with an increased risk of thromboembolic complications. The 2019 ESC–Heart Failure Association position statement issued a weak recommendation for bromocriptine use, advising that it should always be accompanied by at least prophylactic anticoagulation.15 Future randomised controlled trials and registry data with longer follow-up are needed to provide stronger evidence supporting its use. For women who do not recover from PPCM within 1 year, the American College of Cardiology/American Heart Association Joint Committee and the ESC recommend implantable cardioverter-defibrillator therapy for the primary prevention of sudden cardiac death due to ventricular tachyarrhythmia.22 29 30 Cardiac transplantation may be required for patients with refractory severe heart failure despite maximal medical therapy, as occurred in one of our cases.
     
    Cardiac recovery and mortality
    Estimates of left ventricular recovery and mortality in PPCM vary considerably across geographic regions,26 presumably due to differences in medical therapy, access to healthcare services, and follow-up duration. A 2022 meta-analysis of 4875 patients from 60 countries reported overall 12-month rates of left ventricular recovery and all-cause mortality of 58.7% and 9.8%, respectively.26 In our cohort, 60% of women achieved cardiac recovery; two patients (6.7%) died of myocardial infarction and pulmonary embolism within 12 months of diagnosis. Both had poor social support and did not adhere to treatment or attend follow-up visits, which likely contributed to their adverse outcomes. These findings highlight the need for greater public awareness, improved medication compliance, and stronger social support systems. We recommend enhanced nursing outreach and structured patient education, along with post-discharge monitoring, to optimise outcomes.
     
    Prevention of thromboembolic complications
    Thromboembolism, a potentially life-threatening complication of PPCM, affected 23.3% of women in our cohort. This high rate may be attributed to the hypercoagulable state of pregnancy, impaired circulation, and blood stasis from cardiac failure. Our incidence was higher than the reported global rate of 6.1% in a recent international study.26 Therapeutic anticoagulation is recommended for patients with intracardiac thrombus or systemic embolism. In our study, 13.3% of patients received low molecular weight heparin for thromboembolism prophylaxis. Both the AHA and ESC recommend anticoagulation in PPCM cases involving severe left ventricular dysfunction (LVEF <30% to <35%) during the peripartum period and up to 8 weeks postpartum.29 31 Despite the high thromboembolic risk in PPCM, anticoagulation remains a subject of ongoing debate.32 Our data support prophylactic anticoagulation for all women with PPCM, given the high incidence observed. Ultimately, individual assessment of thromboembolic risk—considering the extent of left ventricular dysfunction, caesarean delivery, immobility, and ventricular dilatation—may help identify patients most likely to benefit from thromboprophylaxis.
     
    Relapse of peripartum cardiomyopathy in subsequent pregnancies
    Relapse of PPCM and associated mortality in subsequent pregnancies are not uncommon; rates range from 5.3% to 29.5% and 0% to 55.5%, respectively.33 In our study, nine of 11 patients (81.8%) had confirmed recovery of cardiac function before conception. There were no maternal deaths or PPCM recurrences during pregnancy. A recent meta-analysis showed that women with persistent left ventricular dysfunction prior to a subsequent pregnancy had a higher risk of mortality and worsening function compared to women whose cardiac function had recovered.33 However, recovered left ventricular function does not guarantee an uncomplicated subsequent pregnancy.34 35 It is crucial to monitor cardiac function throughout pregnancy—and up to 6 months postpartum—to detect subclinical left ventricular dysfunction or PPCM recurrence. Women with a history of PPCM should be counselled regarding the risks of future pregnancies, including irreversible ventricular deterioration, maternal death, and fetal loss.36 Subsequent pregnancy is not recommended if LVEF fails to normalise. Contraceptive counselling should begin early after the acute event; reliable methods with minimal thromboembolic risk are preferred.37
     
    Genetic assessment
    A study has demonstrated a genetic contribution to PPCM in at least 15% of cases.38 The most commonly affected gene is TTN, which encodes the large sarcomeric protein titin.39 The relative prevalence of truncating variants in these genes is nearly identical between PPCM and DCM.39 In our study, three of 30 patients (10%) were screened for cardiomyopathy-related genes (TTN, FLNC, MYBPC3, NEXN), all of whom were in the non-recovery group, indicating that at least 10% had a genetic predisposition to PPCM. The American College of Cardiology/American Heart Association Joint Committee recommends that patients with non-ischaemic cardiomyopathy undergo genetic counselling and testing for inherited cardiomyopathies to facilitate early cardiac disease detection and timely initiation of treatments that reduce heart failure progression and sudden death risk.22 The identification of pathogenic genetic variants can provide valuable prognostic information and clarify associated risks (eg, arrhythmic complications linked to FLNC and DSP mutations), thereby guiding decisions on preventive measures, including implantable defibrillator placement and exercise recommendations. Furthermore, cascade genetic testing for relatives enables closer pregnancy monitoring, informed reproductive decisions (including prenatal or preimplantation genetic diagnosis), and lifelong cardiovascular surveillance to improve outcomes.40 The value of routine genetic testing remains limited by low penetrance, variable clinical expression, and uncertain variant significance. It may also lead to patient anxiety, potential genetic discrimination, and substantial resource implications. Careful patient selection with thorough pre- and post-test counselling is essential. Because the clinical presentation of PPCM closely resembles that of DCM, the ESC suggests that genetic testing be considered in PPCM cases with a positive family history,15 where clinically actionable findings are most likely to be identified.
     
    Limitations
    This study had several limitations. Because PPCM is a rare condition, a small sample size was inevitable. The retrospective nature of data collection over a 10-year period may have resulted in incomplete information. Outcomes could also have been influenced by variations in heart failure management over time and across hospitals. Furthermore, some PPCM cases managed in the private sector or outside Hong Kong might not have been captured. The long-term impact of PPCM on women’s overall health was not assessed. The establishment of a local PPCM registry would facilitate a better understanding of the condition, identification of outcome determinants, and optimisation of clinical care in Hong Kong.
     
    Conclusion
    Peripartum cardiomyopathy is an uncommon but potentially life-threatening medical condition affecting women worldwide. Genetic factors contribute to disease susceptibility in at least 10% of cases. Genetic testing may offer a valuable tool to guide prognosis and management in affected women.
     
    Author contributions
    Concept or design: LSK Law, LT Kwong, PL So.
    Acquisition of data: LSK Law, KH Siong, HC Mok, STK Wong, JKO Wai, CY Chow, WL Chan, KY Tse, YYY Chan, KS Eu, PL So.
    Analysis or interpretation of data: LSK Law, PL So.
    Drafting of the manuscript: LSK Law, PL So.
    Critical revision of the manuscript for important intellectual content: All authors.
     
    All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
     
    Conflicts of interest
    All authors have disclosed no conflicts of interest.
     
    Acknowledgement
    The authors thank all staff in the Statistics Department at Tuen Mun Hospital for their assistance with data collection.
     
    Funding/support
    This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
     
    Ethics approval
    This research was approved by the Central Institutional Review Board of Hospital Authority, Hong Kong (Ref No.: CIRB-2023-114-3). The requirement for informed patient consent was waived by the Board due to the retrospective nature of the research. All data used in the research were anonymised and unidentifiable.
     
    Supplementary material
    The supplementary material was provided by the authors and some information may not have been peer reviewed. Accepted supplementary material will be published as submitted by the authors, without any editing or formatting. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by the Hong Kong Academy of Medicine and the Hong Kong Medical Association. The Hong Kong Academy of Medicine and the Hong Kong Medical Association disclaim all liability and responsibility arising from any reliance placed on the content.
     
    References
    1. Davis MB, Arany Z, McNamara DM, Goland S, Elkayam U. Peripartum cardiomyopathy: JACC state-of-the-art review. J Am Coll Cardiol 2020;75:207-21. Crossref
    2. Sliwa K, Hilfiker-Kleiner D, Petrie MC, et al. Current state of knowledge on aetiology, diagnosis, management, and therapy of peripartum cardiomyopathy: a position statement from the Heart Failure Association of the European Society of Cardiology Working Group on peripartum cardiomyopathy. Eur J Heart Fail 2010;12:767-78. Crossref
    3. Isezuo SA, Abubakar SA. Epidemiologic profile of peripartum cardiomyopathy in a tertiary care hospital. Ethn Dis 2007;17:228-33.
    4. Kamiya CA, Kitakaze M, Ishibashi-Ueda H, et al. Different characteristics of peripartum cardiomyopathy between patients complicated with and without hypertensive disorders. -Results from the Japanese Nationwide survey of peripartum cardiomyopathy-. Circ J 2011;75:1975-81. Crossref
    5. Pierce JA, Price BO, Joyce JW. Familial occurrence of postpartal heart failure. Arch Intern Med 1963;111:651-5. Crossref
    6. Morales A, Painter T, Li R, et al. Rare variant mutations in pregnancy-associated or peripartum cardiomyopathy. Circulation 2010;121:2176-82. Crossref
    7. van Spaendonck-Zwarts KY, van Tintelen JP, van Veldhuisen DJ, et al. Peripartum cardiomyopathy as a part of familial dilated cardiomyopathy. Circulation 2010;121:2169-75. Crossref
    8. van Spaendonck-Zwarts KY, Posafalvi A, van den Berg MP, et al. Titin gene mutations are common in families with both peripartum cardiomyopathy and dilated cardiomyopathy. Eur Heart J 2014;35:2165-73. Crossref
    9. Honigberg MC, Givertz MM. Peripartum cardiomyopathy. BMJ 2019;364:k5287. Crossref
    10. Sliwa K, Petrie MC, van der Meer P, et al. Clinical presentation, management, and 6-month outcomes in women with peripartum cardiomyopathy: an ESC EORP registry. Eur Heart J 2020;41:3787-97. Crossref
    11. Koerber D, Khan S, Kirubarajan A, et al. Meta-analysis of long-term (>1 year) cardiac outcomes of peripartum cardiomyopathy. Am J Cardiol 2023;194:71-7. Crossref
    12. Karaye KM, Ishaq NA, Sai’du H, et al. Disparities in clinical features and outcomes of peripartum cardiomyopathy in high versus low prevalent regions in Nigeria. ESC Heart Fail 2021;8:3257-67. Crossref
    13. Kolte D, Khera S, Aronow WS, et al. Temporal trends in incidence and outcomes of peripartum cardiomyopathy in the United States: a nationwide population-based study. J Am Heart Assoc 2014;3:e001056. Crossref
    14. Lewey J, Levine LD, Elovitz MA, Irizarry OC, Arany Z. Importance of early diagnosis in peripartum cardiomyopathy. Hypertension 2020;75:91-7. Crossref
    15. Bauersachs J, König T, van der Meer P, et al. Pathophysiology, diagnosis and management of peripartum cardiomyopathy: a position statement from the Heart Failure Association of the European Society of Cardiology Study Group on peripartum cardiomyopathy. Eur J Heart Fail 2019;21:827-43. Crossref
    16. Bello N, Rendon IS, Arany Z. The relationship between pre-eclampsia and peripartum cardiomyopathy: a systematic review and meta-analysis. J Am Coll Cardiol 2013;62:1715-23. Crossref
    17. Parikh P, Blauwet L. Peripartum cardiomyopathy and preeclampsia: overlapping diseases of pregnancy. Curr Hypertens Rep 2018;20:69. Crossref
    18. Henderson JT, Vesco KK, Senger CA, Thomas RG, Redmond N. Aspirin use to prevent preeclampsia and related morbidity and mortality: updated evidence report and systematic review for the US Preventive Services Task Force. JAMA 2021;326:1192-206. Crossref
    19. Cherubin S, Peoples T, Gillard J, Lakhal-Littleton S, Kurinczuk JJ, Nair M. Systematic review and meta-analysis of prolactin and iron deficiency in peripartum cardiomyopathy. Open Heart 2020;7:e001430. Crossref
    20. Anand IS, Gupta P. Anemia and iron deficiency in heart failure: current concepts and emerging therapies. Circulation 2018;138:80-98. Crossref
    21. Sigauke FR, Ntsinjana H, Tsabedze N. Peripartum cardiomyopathy: a comprehensive and contemporary review. Heart Fail Rev 2024;29:1261-78. Crossref
    22. Heidenreich PA, Bozkurt B, Aguilar D, et al. 2022 AHA/ACC/HFSA Guideline for the Management of Heart Failure: a report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation 2022;145:e895-1032. Crossref
    23. Arany Z. Peripartum cardiomyopathy. N Engl J Med 2024;390:154-64. Crossref
    24. Azibani F, Sliwa K. Peripartum cardiomyopathy: an update. Curr Heart Fail Rep 2018;15:297-306. Crossref
    25. Maddox TM, Januzzi JL Jr, Allen LA, et al. 2024 ACC Expert Consensus Decision Pathway for treatment of heart failure with reduced ejection fraction: a report of the American College of Cardiology Solution Set Oversight Committee. J Am Coll Cardiol 2024;83:1444-88. Crossref
    26. Hoevelmann J, Engel ME, Muller E, et al. A global perspective on the management and outcomes of peripartum cardiomyopathy: a systematic review and meta-analysis. Eur J Heart Fail 2022;24:1719-36. Crossref
    27. Hilfiker-Kleiner D, Kaminski K, Podewski E, et al. A cathepsin D–cleaved 16 kDa form of prolactin mediates postpartum cardiomyopathy. Cell 2007;128:589-600. Crossref
    28. Kumar A, Ravi R, Sivakumar RK, et al. Prolactin inhibition in peripartum cardiomyopathy: systematic review and meta-analysis. Curr Probl Cardiol 2023;48:101461. Crossref
    29. Bauersachs J, Arrigo M, Hilfiker-Kleiner D, et al. Current management of patients with severe acute peripartum cardiomyopathy: practical guidance from the Heart Failure Association of the European Society of Cardiology Study Group on peripartum cardiomyopathy. Eur J Heart Fail 2016;18:1096-105. Crossref
    30. McDonagh TA, Metra M, Adamo M, et al. 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure: developed by the Task Force for the diagnosis and treatment of acute and chronic heart failure of the European Society of Cardiology (ESC). With the special contribution of the Heart Failure Association (HFA) of the ESC. Eur J Heart Fail 2022;24:4-131. Crossref
    31. Bozkurt B, Colvin M, Cook J, et al. Current diagnostic and treatment strategies for specific dilated cardiomyopathies: a scientific statement from the American Heart Association. Circulation 2016;134:e579-646. Crossref
    32. Radakrishnan A, Dokko J, Pastena P, Kalogeropoulos AP. Thromboembolism in peripartum cardiomyopathy: a systematic review. J Thorac Dis 2024;16:645-60. Crossref
    33. Wijayanto MA, Myrtha R, Lukas GA, et al. Outcomes of subsequent pregnancy in women with peripartum cardiomyopathy: a systematic review and meta-analysis. Open Heart 2024;11:e002626. Crossref
    34. Pachariyanon P, Bogabathina H, Jaisingh K, Modi M, Modi K. Long-term outcomes of women with peripartum cardiomyopathy having subsequent pregnancies. J Am Coll Cardiol 2023;82:16-26. Crossref
    35. Fett JD, Shah TP, McNamara DM. Why do some recovered peripartum cardiomyopathy mothers experience heart failure with a subsequent pregnancy? Curr Treat Options Cardiovasc Med 2015;17:354. Crossref
    36. Sliwa K, van der Meer P, Petrie MC, et al. Corrigendum to ‘Risk stratification and management of women with cardiomyopathy/heart failure planning pregnancy or presenting during/after pregnancy: a position statement from the Heart Failure Association of the European Society of Cardiology Study Group on Peripartum Cardiomyopathy’ [Eur J Heart Fail 2021;23:527-540]. Eur J Heart Fail 2022;24:733. Crossref
    37. Sliwa K, Petrie MC, Hilfiker-Kleiner D, et al. Long-term prognosis, subsequent pregnancy, contraception and overall management of peripartum cardiomyopathy: practical guidance paper from the Heart Failure Association of the European Society of Cardiology Study Group on Peripartum Cardiomyopathy. Eur J Heart Fail 2018;20:951-62. Crossref
    38. Ware JS, Li J, Mazaika E, et al. Shared genetic predisposition in peripartum and dilated cardiomyopathies. N Engl J Med 2016;374:233-41. Crossref
    39. Goli R, Li J, Brandimarto J, et al. Genetic and phenotypic landscape of peripartum cardiomyopathy. Circulation 2021;143:1852-62. Crossref
    40. Arany Z. It is time to offer genetic testing to women with peripartum cardiomyopathy. Circulation 2022;146:4-5. Crossref

    Use of 18F-fluorodeoxyglucose positron emission tomography coupled with computed tomography in early breast cancer management: consensus-based local recommendations by the Hong Kong Breast Cancer Foundation PET/CT Study Group

    Hong Kong Med J 2025 Dec;31(6):426–33 | Epub 12 Nov 2025
    © Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
     
    ORIGINAL ARTICLE  CME
    Use of 18F-fluorodeoxyglucose positron emission tomography coupled with computed tomography in early breast cancer management: consensus-based local recommendations by the Hong Kong Breast Cancer Foundation PET/CT Study Group
    Carol CH Kwok, MB, ChB, FHKAM (Radiology)# † 1; Henry CY Wong, MB, BS, FHKAM (Radiology)# † 1; Catherine YH Wong, MB, BS, FHKAM (Radiology)† 2; LW Yuen, MS, MA3; CC Yau, MB, BS, FHKAM (Radiology)† 3; Polly SY Cheung, MB, BS, FHKAM (Surgery)† 3
    1 Department of Oncology, Princess Margaret Hospital, Hong Kong SAR, China
    2 Department of Nuclear Medicine, Hong Kong Sanatorium & Hospital, Hong Kong SAR, China
    3 Hong Kong Breast Cancer Foundation, Hong Kong SAR, China
    # Equal contribution
    Members of the Hong Kong Breast Cancer Foundation PET/CT Study Group
     
    Corresponding author: Dr Carol CH Kwok (kwokch@ha.org.hk)
     
     Full paper in PDF
     
    Abstract
    Introduction: 18F-fluorodeoxyglucose positron emission tomography coupled with computed tomography (PET/CT) has been incorporated into breast cancer management. In Hong Kong, PET/CT use is increasing. This study aimed to establish consensus-based recommendations on the use of PET/CT in the management of early breast cancer.
     
    Methods: A literature search was conducted in September 2023 using the keywords “breast cancer” and “PET/CT” within PubMed to identify research articles related to the use of PET/CT in early breast cancer. Guidelines from major international cancer agencies were also reviewed. Ten recommendation statements were drafted. A two-round modified Delphi consensus process was conducted over a 3-month period (19 December 2023 to 29 February 2024).
     
    Results: A total of 76 experts consented to participate in the first round, of whom 71 completed the second round and were included as members of the expert panel, yielding a second-round response rate of 93.4%. The panel comprised oncologists (n=30, 42.3%), surgeons (n=35, 49.3%), and radiologists (including nuclear medicine radiologists) [n=6, 8.5%]. Experts from the Hospital Authority (n=37, 52.1%) and the private sector (n=32, 45.1%) were well represented. Two experts (2.8%) were from one of the two local university medical faculties. Over 75% of expert panel members had at least 15 years of clinical experience. Of the ten statements, consensus was achieved on seven in the first round and one additional statement in the second round.
     
    Conclusion: Through the consensus process, the proposed recommendations are expected to gain wider acceptance and recognition among local healthcare professionals as guidance for the use of PET/CT in early breast cancer management.
     
     
    New knowledge added by this study
    • First-of-its-kind local consensus-based recommendations on the use of positron emission tomography coupled with computed tomography (PET/CT) in early breast cancer were established.
    • The proposed recommendations were based on the largest and most up-to-date evidence, which reflected updated international guideline recommendations.
    • The consensus-establishing process provided a platform for exchange and sharing among multidisciplinary teams in resolving controversial aspects of clinical practice.
    Implications for clinical practice or policy
    • Local recommendations on the use of PET/CT for early breast cancer patients have been proposed in light of the increasing availability of PET/CT facilities in Hong Kong.
    • These consensus recommendations cover important and relevant clinical settings, including screening, preoperative assessment of multifocality, axillary staging, pretreatment staging, evaluation of tumour response and axillary nodal status in the neoadjuvant setting before surgery, re-staging in recurrence, and follow-up for surveillance.
    • Through the consensus process, the proposed recommendations are expected to gain wider acceptance and recognition among local healthcare professionals as guidance on the use of PET/CT in early breast cancer management.
     
     
    Introduction
    Diagnostic imaging plays an important role in the screening, diagnosis, staging, and follow-up of patients affected by breast cancer. Mammography and breast ultrasound are the current standards of care for screening, diagnosis, and surveillance. For patients with locally advanced disease, guidelines recommend contrast-enhanced computed tomography (CT) scans and bone scans to detect distant metastases. In recent years, 18F-fluorodeoxyglucose (18F-FDG) positron emission tomography coupled with CT (PET/CT) has been introduced as an important imaging modality in oncological care. It is a powerful tool that combines the spatial resolution of a CT scan with information regarding biological processes within the scanned region. Positron emission tomography coupled with CT has the potential to identify malignant disease that may otherwise be missed or classified as benign based on size or morphological features in conventional imaging modalities.
     
    In 2021, the Hong Kong Breast Cancer Foundation (HKBCF) analysed the utilisation of PET/CT among patients enrolled in the Hong Kong Breast Cancer Registry since 2007. Among the 4154 patients studied, the utilisation rate of PET/CT was 40.4% (online supplementary Fig 1). There was an increasing trend in PET/CT scan use for breast cancer staging over the past two decades. The overall utilisation of PET/CT increased from 23.3% in 2006-2010, to 48.5% in 2011-2015, and to 61.6% in the 2016-2021 cohort across all cancer stages (online supplementary Fig 2). This trend largely reflected the increasing availability of PET/CT facilities in Hong Kong. Over the past two decades, multiple PET/CT scanning facilities have been established in both the public and private sectors, making the service more accessible. Overall, usage of PET/CT was correlated with higher pathological stages of disease. Notably, PET/CT was used in up to 13.8% of stage 0 cases and 21.0% of stage I cases (online supplementary Fig 3).
     
    Given the relatively high costs, concerns regarding radiation exposure, and the possibility of false-negative results, it is important to provide local recommendations on which groups of patients would benefit from the use of PET/CT in breast cancer. Through this study, we aimed to develop a local guideline regarding the use of PET/CT for early breast cancer to assist healthcare professionals in making evidence-based recommendations.
     
    Methods
    The objective of this study was to develop local recommendations on how to utilise PET/CT in the screening, diagnosis, staging, treatment response assessment, and surveillance of early breast cancer. A study group consisting of five members from the HKBCF (first, second, third, fifth and sixth authors) was convened. Study Group members were involved in performing the literature search, constructing the Delphi survey, analysing data, interpreting findings, and providing final approval of the recommendations.
     
    To construct the survey, a literature search was performed in September 2023 by the Study Group using the keywords “breast cancer” and “PET/CT” in PubMed to identify research articles related to the use of PET/CT in early breast cancer. Systematic reviews and randomised controlled trials were prioritised to form the evidence base for the proposed statements. Guidelines from major international cancer agencies, including the National Comprehensive Cancer Network (NCCN) and the European Society for Medical Oncology, were reviewed. Ten statements were drafted based on the literature and international guidelines.
     
    Delphi consensus process
    A two-round modified Delphi consensus process was conducted over a period of 3 months (19 December 2023 to 29 February 2024). Surveys were developed using Google Forms, a web-based development tool. Responses provided by individual participants were anonymised to protect confidentiality. This study did not involve any patients as participants. Only individuals who took part in the first round were invited to participate in the second round.
     
    Experienced physicians with an interest in breast cancer, working in the medical faculties of The University of Hong Kong and The Chinese University of Hong Kong, the Hospital Authority, and the private sector, were identified by the Study Group and invited to participate in the Delphi process. Additionally, members of the Hong Kong Breast Cancer Registry Steering Committee, the Hong Kong Breast Oncology Group, and the Hong Kong Society of Breast Surgeons were invited. Emails were sent to all potential participants by the Study Group to confirm their interest in participating.
     
    After providing informed consent, participants were directed to an online survey for completion. In the first round, participants were provided with a summary of evidence corresponding to each of the ten statements in the survey (online Appendix 1). Participants were asked to indicate the extent of their agreement or disagreement on a five-point Likert scale (‘Completely agree’, ‘Agree’, ‘Neutral’, ‘Disagree’, and ‘Completely disagree’) for each statement. Respondents who selected ‘Disagree’ or ‘Completely disagree’ were asked to provide reasons for their choice in a free-text field within the survey. In accordance with published recommendations, statements that achieved agreement (‘Completely agree’ or ‘Agree’) from more than 75% of participants were considered to have reached consensus.
     
    Following participant voting, the Study Group compiled and prepared the results from the first round. Statements that did not reach consensus were reviewed and amended based on participant feedback. For the second round, statements that did not reach consensus, or were newly created or modified based on participant feedback, were sent as a survey to the same participants. Participants were shown the results of the first round and informed where amendments had been made to statements in the second round.
     
    Consensus statement disclaimer
    The recommendations provided in this publication reflect the majority opinion of the expert panel. Although the recommendations are intended to guide clinical decision-making, they should not be regarded as the sole indications for utilising PET/CT in early breast cancer management. These consensus-based recommendations are designed to provide guidance for oncologists, surgeons, general practitioners, radiologists, and other physicians involved in the care of patients with early breast cancer. Treatment decisions for individual patients should ultimately be made at the discretion of the treating clinician, in conjunction with the patient’s unique needs and through shared decision-making.
     
    Results
    Two Delphi consensus rounds were completed. Among the 270 invited experts, 76 consented to participate in the first round, of whom 71 completed the second round and were included as members of the expert panel (online Appendix 2). The response rate for the second round was 93.4%. The panel comprised oncologists (n=30, 42.3%), surgeons (n=35, 49.3%), and radiologists (including nuclear medicine radiologists) [n=6, 8.5%]. Experts from the Hospital Authority (n=37, 52.1%) and the private sector (n=32, 45.1%) were well represented. Two experts (2.8%) were from one of the two medical faculties of the local universities. Over 75% of expert panel members had at least 15 years of clinical experience.
     
    Of the ten statements, consensus was achieved on seven in the first round. Three statements were returned to the expert panel for rating in the second round, of which one achieved consensus (Fig). The results of the final consensus on the recommendation statements after the two-round Delphi consensus process are listed in the Table.
     

    Figure. Modified Delphi process
     

    Table. Results of the final consensus on the recommendation statements after a two-round Delphi consensus process
     
    Discussion
    In recent years, driven by increasing demand and easier access to PET/CT services, there has been a substantial increase in the use of PET/CT for breast cancer patients. Currently, there are 33 PET/CT machines across public, private, and academic institutions in Hong Kong. While PET/CT has the capability to enhance the detection of occult malignant disease, it also carries the risk of identifying false-positives and incidental findings, which could lead to unnecessary investigations and potentially delay curative-intent treatments. Although the utility of PET/CT in various breast cancer settings has been widely studied, there remains a lack of large prospective randomised studies comparing it with other imaging modalities. Given that PET/CT is costly and poses concerns about increased radiation exposure compared with other imaging techniques, such as contrast-enhanced CT scans, the development of local guidance and recommendations regarding its indications is clinically relevant and essential. To our knowledge, this consensus-based guideline is the first to provide practical recommendations on the use of PET/CT for breast cancer management.
     
    Of the ten recommendation statements proposed, seven achieved consensus in the first round, suggesting that the indications for PET/CT in these areas are clear-cut and less controversial. These statements covered areas related to the screening, diagnosis, staging, and surveillance of breast cancer. Overall, the majority of local experts agreed that PET/CT should only be utilised in situations where patients have a high risk of distant metastases. This approach includes staging patients with advanced clinical stage disease or aggressive tumour biology and evaluating cancer survivors with suspicious clinical signs and symptoms suggestive of recurrence. Conversely, PET/CT should not be used in situations where the likelihood of detecting malignant disease is low, such as staging of ductal carcinoma in situ or stage I disease, screening asymptomatic women for breast cancer, and routine surveillance of cancer survivors. Increased 18F-FDG avidity of malignant cells forms the basis of 18F-FDG-PET in breast cancer imaging. Tumour characteristics that limit the sensitivity of 18F-FDG-PET in breast cancer imaging include small tumour size, low tumour grade, low proliferation, high expression of hormone receptors (particularly luminal A phenotype), and lobular histological type.1 2 3 Positron emission tomography coupled with CT therefore has limited sensitivity in detecting subcentimetre tumours,4 5 micrometastases, and small lymph node metastases in a clinically negative axilla relative to sentinel lymph node biopsy (SLNB).6 7 Additionally, the specificity of PET/CT is affected—some benign tumours and infectious or inflammatory conditions can demonstrate 18F-FDG uptake.8 Positron emission tomography coupled with CT has limited spatial resolution in assessing the multifocality of breast cancer.9
     
    In contrast to its low sensitivity for detecting axillary nodal metastases, 18F-FDG PET/CT demonstrates high sensitivity in detecting extra-axillary lymph node involvement, including internal mammary, infraclavicular, and supraclavicular nodes10 11; distant metastases; and other unsuspected synchronous malignancies during initial breast cancer staging, which can potentially lead to upstaging and ultimately modification of planned treatment.12 13 14 The detection of extra-axillary lymph node involvement aids in selecting candidates for neoadjuvant chemotherapy and may guide subsequent radiotherapy planning to ensure adequate coverage of nodal involvement sites.11 15 16 In contrast to stage 0 and stage I disease, where the likelihood of distant metastasis is low, there is a growing body of evidence that PET/CT may outperform conventional imaging (contrast-enhanced CT of the thorax, abdomen, and pelvis; and bone scan).17 18 Furthermore, high-grade and poor-risk cancer subtypes may exhibit increased 18F-FDG uptake, thereby enhancing the diagnostic yield of PET/CT in staging these tumours.19 20 21 Our recommendations align with those of the NCCN22 and the French working group,23 which recently updated their guidance in this regard.
     
    Controversies
    The two recommendation statements that did not reach consensus after the Delphi rounds related to post–neoadjuvant therapy evaluation of tumour response to guide surgery to the primary tumour and axilla. In recent years, neoadjuvant chemotherapy has been increasingly used to downstage disease, facilitate surgery, and provide an opportunity for in vivo tumour response assessment to guide individualised treatment escalation or de-escalation after surgery. This approach has become the standard of care for patients with larger tumours who wish to undergo breast-conserving therapy and for stage II and III patients with aggressive tumour biology (eg, triple-negative and human epidermal growth factor receptor 2–positive breast cancer).22 Current studies on post-neoadjuvant chemotherapy tumour response assessment have mainly focused on the prediction of pathological complete response.24 25 26 27 Previous studies have shown that magnetic resonance imaging (MRI) may exhibit higher sensitivity, whereas PET/CT demonstrates higher specificity in predicting the pathological response after neoadjuvant chemotherapy, indicating the complementary value of combining these modalities to improve diagnostic performance.28
     
    The method of assessing primary tumour response during neoadjuvant therapy has varied across clinical trials. For example, in the NeoSphere trial, which evaluated the addition of neoadjuvant pertuzumab to docetaxel and trastuzumab, clinical response was assessed via physical examination.29 Other trials have supplemented clinical assessment with diagnostic imaging during treatment. In the PREDIX HER2 trial, which compared neoadjuvant docetaxel, trastuzumab and pertuzumab versus trastuzumab emtansine, investigators routinely utilised mammography, ultrasound, or MRI after the second, fourth, and sixth cycles for response assessment.30 Positron emission tomography coupled with CT was performed at baseline, then repeated after the second and final cycles at the investigators’ discretion.30 Currently, international guidelines vary in their recommendations of preferred assessment modality. The 2024 European Society for Medical Oncology guideline31 recommends the use of MRI to assess local response if pretreatment MRI data are available. The NCCN guidelines22 suggest that assessment should include physical examination and imaging studies, with the choice of imaging modality determined by a multidisciplinary team. The differing opinions within our expert panel reflect these variations in existing evidence and guidelines. Clinicians should individualise their assessment strategy based on the patient’s clinical status and access to imaging modalities.
     
    It has long been the standard of care to offer axillary lymph node dissection to patients with a clinically positive axillary lymph node to ensure adequate tumour clearance. However, given the introduction of neoadjuvant systemic therapies, ongoing studies are evaluating alternative approaches to axillary management to reduce the risk of arm lymphoedema. In patients who have converted from clinically node-positive to clinically node-negative disease after systemic therapy, SLNB and targeted axillary lymph node dissection are currently recommended by international guidelines (instead of routine axillary lymph node dissection).22 Our Delphi study surveyed the views of local experts on whether PET/CT should be recommended as an additional imaging modality to screen for occult residual axillary disease. While recognising that PET/CT may yield false-positive results, some experts reported using PET/CT to guide whether axillary lymph node dissection could be undertaken directly without a positive SLNB, particularly in patients with initially bulky axillary disease. This approach aligns with the latest NCCN guidelines,22 which caution against the use of SLNB in pre-chemotherapy clinical N2 stage disease. The statement that PET/CT is not recommended to guide the decision for axillary lymph node dissection in patients with clinically node-positive disease who become node-negative on clinical examination and ultrasound and/or MRI after neoadjuvant systemic therapy remains open. Further studies regarding the accuracy of PET/CT in this context may help resolve the controversy. The management approach for the axilla after neoadjuvant therapy is constantly evolving. For example, axillary radiation is currently being tested as an alternative to axillary lymph node dissection in the ongoing Alliance A011202 randomised trial among patients with a positive SLNB.32 The timing and role of PET/CT will need to be re-evaluated within this ever-changing paradigm of axillary management in the neoadjuvant setting.
     
    Positron emission tomography coupled with CT is often presumed to involve high radiation exposure. However, when used appropriately for breast cancer staging with low-dose, non-contrast CT, the radiation exposure can be considerably lower than that of whole-body, high-resolution contrast CT combined with a bone scan. Previous international guidelines have suggested that PET/CT can be performed in situations where standard staging studies are equivocal or suspicious.22 31 Such a sequential approach may not be cost-effective in the clinical scenarios outlined by our expert panel and may expose patients to unnecessary radiation from multiple whole-body imaging examinations. The use of PET/CT as a one-stop assessment enables quicker evaluation of disease status and can facilitate earlier initiation of appropriate treatment.33
     
    Strengths and limitations
    A strength of our Delphi consensus study is that it involved a large group of experienced specialists representing multiple disciplines and both the public and private sectors. This consensus exercise provided a valuable platform in which clinical experiences, practices, ideas, and opinions were shared and exchanged anonymously. It also helped resolve controversial issues and achieve consensus, particularly in areas where high-level evidence is absent. Recommendations that have achieved consensus should receive wider acceptance and recognition when incorporated into clinical practice.
     
    However, our study had notable limitations. First, expert panellists were invited by the Study Group, and thus the consensus results may not fully reflect the views of all local practitioners involved in treating breast cancer patients. Nevertheless, our sample size of more than 70 participants is considered large for Delphi studies, and we achieved balanced representation of participants from various backgrounds. Second, the initial statements were devised based on recently published articles selected by the Study Group, which could introduce bias compared with a formal systematic review. However, the Study Group prioritised reviewing meta-analyses and randomised controlled trials when drafting the initial statements to ensure they reflected the most up-to-date, high-level evidence.
     
    Conclusion
    Based on the results of this Delphi consensus study, the HKBCF PET/CT Study Group provides recommendations on the use of PET/CT for early breast cancer in areas of screening, diagnosis, staging, and surveillance. These recommendations are intended to guide the appropriate use of PET/CT in the local population across both public and private healthcare settings. Breast cancer management is rapidly advancing, and the management paradigm is continually evolving as new evidence becomes available. As technology progresses, more innovative imaging modalities, such as PET/MRI and PET scans with new radiotracers, are expected to play an increasing role.14 34 35 The Study Group will review and update these recommendation guidelines at regular intervals based on emerging evidence, particularly in relation to response assessment during and after neoadjuvant systemic therapy.
     
    Author contributions
    Concept or design: PSY Cheung, CC Yau, CCH Kwok, HCY Wong, CYH Wong.
    Acquisition of data: CCH Kwok, HCY Wong.
    Analysis or interpretation of data: HCY Wong, CCH Kwok, LW Yuen.
    Drafting of the manuscript: CCH Kwok, HCY Wong.
    Critical revision of the manuscript for important intellectual content: CCH Kwok, HCY Wong, CYH Wong, CC Yau, PSY Cheung.
     
    All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
     
    Conflicts of interest
    All authors have disclosed no conflicts of interest.
     
    Acknowledgement
    The authors thank all participants who contributed to this research.
     
    Funding/support
    This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
     
    Ethics approval
    This research was approved by the Breast Cancer Research Centre Research Committee of the Hong Kong Breast Cancer Foundation. The requirement for informed consent from patients was waived by the Committee as patient data collection by the Hong Kong Breast Cancer Registry was approved by respective participating hospitals and centres. The present study does not involve patient participation and there was no new patient data collection.
     
    Supplementary material
    The supplementary material was provided by the authors and some information may not have been peer reviewed. Accepted supplementary material will be published as submitted by the authors, without any editing or formatting. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by the Hong Kong Academy of Medicine and the Hong Kong Medical Association. The Hong Kong Academy of Medicine and the Hong Kong Medical Association disclaim all liability and responsibility arising from any reliance placed on the content.
     
    References
    1. Groheux D, Giacchetti S, Moretti JL, et al. Correlation of high 18F-FDG uptake to clinical, pathological and biological prognostic factors in breast cancer. Eur J Nucl Med Mol Imaging 2011;38:426-35. Crossref
    2. Buck A, Schirrmeister H, Kühn T, et al. FDG uptake in breast cancer: correlation with biological and clinical prognostic parameters. Eur J Nucl Med Mol Imaging 2002;29:1317-23. Crossref
    3. Humbert O, Berriolo-Riedinger A, Cochet A, et al. Prognostic relevance at 5 years of the early monitoring of neoadjuvant chemotherapy using 18F-FDG PET in luminal HER2-negative breast cancer. Eur J Nucl Med Mol Imaging 2014;41:416-27. Crossref
    4. Avril N, Rosé CA, Schelling M, et al. Breast imaging with positron emission tomography and fluorine-18 fluorodeoxyglucose: use and limitations. J Clin Oncol 2000;18:3495-502. Crossref
    5. Kumar R, Chauhan A, Zhuang H, Chandra P, Schnall M, Alavi A. Clinicopathologic factors associated with false negative FDG-PET in primary breast cancer. Breast Cancer Res Treat 2006;98:267-74. Crossref
    6. Peare R, Staff RT, Heys SD. The use of FDG-PET in assessing axillary lymph node status in breast cancer: a systematic review and meta-analysis of the literature. Breast Cancer Res Treat 2010;123:281-90. Crossref
    7. Cooper KL, Harnan S, Meng Y, et al. Positron emission tomography (PET) for assessment of axillary lymph node status in early breast cancer: a systematic review and meta-analysis. Eur J Surg Oncol 2011;37:187-98. Crossref
    8. Adejolu M, Huo L, Rohren E, Santiago L, Yang WT. False-positive lesions mimicking breast cancer on FDG PET and PET/CT. AJR Am J Roentgenol 2012;198:W304-14. Crossref
    9. Ergul N, Kadioglu H, Yildiz S, et al. Assessment of multifocality and axillary nodal involvement in early-stage breast cancer patients using 18F-FDG PET/CT compared to contrast-enhanced and diffusion-weighted magnetic resonance imaging and sentinel node biopsy. Acta Radiol 2015;56:917-23. Crossref
    10. Aukema TS, Straver ME, Peeters MJ, et al. Detection of extra-axillary lymph node involvement with FDG PET/CT in patients with stage II–III breast cancer. Eur J Cancer 2010;46:3205-10. Crossref
    11. Seo MJ, Lee JJ, Kim HO, et al. Detection of internal mammary lymph node metastasis with 18F-fluorodeoxyglucose positron emission tomography/computed tomography in patients with stage III breast cancer. Eur J Nucl Med Mol Imaging 2014;41:438-45. Crossref
    12. Rong J, Wang S, Ding Q, Yun M, Zheng Z, Ye S. Comparison of 18FDG PET-CT and bone scintigraphy for detection of bone metastases in breast cancer patients. A meta-analysis. Surg Oncol 2013;22:86-91. Crossref
    13. Sun Z, Yi YL, Liu Y, Xiong JP, He CZ. Comparison of whole-body PET/PET-CT and conventional imaging procedures for distant metastasis staging in patients with breast cancer: a meta-analysis. Eur J Gynaecol Oncol 2015;36:672-6.
    14. Han S, Choi JY. Impact of 18F-FDG PET, PET/CT, and PET/MRI on staging and management as an initial staging modality in breast cancer: a systematic review and metaanalysis. Clin Nucl Med 2021;46:271-82. Crossref
    15. Groheux D, Espié M, Giacchetti S, Hindié E. Performance of FDG PET/CT in the clinical management of breast cancer. Radiology 2013;266:388-405. Crossref
    16. Borm KJ, Voppichler J, Düsberg M, et al. FDG/PET-CT–based lymph node atlas in breast cancer patients. Int J Radiat Oncol Biol Phys 2019;103:574-82. Crossref
    17. Caresia Aroztegui AP, García Vicente AM, Alvarez Ruiz S, et al. 18F-FDG PET/CT in breast cancer: evidence-based recommendations in initial staging. Tumor Biol 2017;39:1010428317728285. Crossref
    18. Dayes IS, Metser U, Hodgson N, et al. Impact of 18F-labeled fluorodeoxyglucose positron emission tomography–computed tomography versus conventional staging in patients with locally advanced breast cancer. J Clin Oncol 2023;41:3909-16. Crossref
    19. de Mooij CM, Ploumen RA, Nelemans PJ, Mottaghy FM, Smidt ML, van Nijnatten TJ. The influence of receptor expression and clinical subtypes on baseline [18F]FDG uptake in breast cancer: systematic review and meta-analysis. EJNMMI Res 2023;13:5. Crossref
    20. Basu S, Chen W, Tchou J, et al. Comparison of triple-negative and estrogen receptor–positive/progesterone receptor–positive/HER2-negative breast carcinoma using quantitative fluorine-18 fluorodeoxyglucose/positron emission tomography imaging parameters: a potentially useful method for disease characterization. Cancer 2008;112:995-1000. Crossref
    21. Ulaner GA, Castillo R, Goldman DA, et al. 18F-FDG-PET/CT for systemic staging of newly diagnosed triple-negative breast cancer. Eur J Nucl Med Mol Imaging 2016;43:1937-44. Crossref
    22. Gradishar WJ, Moran MS, Abraham J, et al. NCCN Guidelines® Breast Cancer Version 4.2023. J Natl Compr Canc Netw 2023;21:594-608. Crossref
    23. Groheux D, Hindie E. Breast cancer: initial workup and staging with FDG PET/CT. Clin Transl Imaging 2021;9:221-31. Crossref
    24. Elsayed B, Alksas A, Shehata M, et al. Exploring neoadjuvant chemotherapy, predictive models, radiomic, and pathological markers in breast cancer: a comprehensive review. Cancers 2023;15:5288. Crossref
    25. Imbriaco M, Ponsiglione A. Predicting pathologic complete response after neoadjuvant chemotherapy. Radiology 2021;299:301-2. Crossref
    26. Romeo V, Accardo G, Perillo T, et al. Assessment and prediction of response to neoadjuvant chemotherapy in breast cancer: a comparison of imaging modalities and future perspectives. Cancers (Basel) 2021;13:3521. Crossref
    27. Lafci O, Resch D, Santonocito A, Clauser P, Helbich T, Baltzer PA. Role of imaging-based response assessment for adapting neoadjuvant systemic therapy for breast cancer: a systematic review. Eur J Radiol 2025:187:112105. Crossref
    28. Caracciolo M, Castello A, Urso L, et al. Comparison of MRI vs. [18F]FDG PET/CT for treatment response evaluation of primary breast cancer after neoadjuvant chemotherapy: literature review and future perspectives. J Clin Med 2023;12:5355. Crossref
    29. Gianni L, Pienkowski T, Im YH, et al. Efficacy and safety of neoadjuvant pertuzumab and trastuzumab in women with locally advanced, inflammatory, or early HER2-positive breast cancer (NeoSphere): a randomised multicentre, open-label, phase 2 trial. Lancet Oncol 2012;13:25-32. Crossref
    30. Hatschek T, Foukakis T, Bjöhle J, et al. Neoadjuvant trastuzumab, pertuzumab, and docetaxel vs trastuzumab emtansine in patients with ERBB2-positive breast cancer: a phase 2 randomized clinical trial. JAMA Oncol 2021;7:1360-7. Crossref
    31. Loibl S, André F, Bachelot T, et al. Early breast cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol 2024;35:159-82. Crossref
    32. National Library of Medicine, National Center for Biotechnology Information, US. Comparison of axillary lymph node dissection with axillary radiation for patients with node-positive breast cancer treated with chemotherapy. Available from: https://clinicaltrials.gov/study/NCT01901094. Accessed 13 Jan 2025.
    33. Hyland CJ, Varghese F, Yau C, et al. Use of 18F-FDG PET/CT as an initial staging procedure for stage II–III breast cancer: a multicenter value analysis. J Natl Compr Canc Netw 2020;18:1510-7. Crossref
    34. Ming Y, Wu N, Qian T, et al. Progress and future trends in PET/CT and PET/MRI molecular imaging approaches for breast cancer. Front Oncol 2020;10:1301. Crossref
    35. Zhang-Yin J. State of the art in 2022 PET/CT in breast cancer: a review. J Clin Med 2023;12:968. Crossref

    Parental depression in the relationship between parental stress and child health among lowincome families in Hong Kong

    Hong Kong Med J 2025 Oct;31(5):374–83 | Epub 23 Sep 2025
    © Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
     
    ORIGINAL ARTICLE
    Parental depression in the relationship between parental stress and child health among low-income families in Hong Kong
    Esther YT Yu, FRACGP, FHKAM (Family Medicine)1; Eric YF Wan, PhD, CStat1,2; Rosa SM Wong, PhD3; Ivy L Mak, PhD1; Kiki SN Liu, PhD1; Caitlin HN Yeung, MB, BS, MPH1; Patrick Ip, FRCPCH, FHKAM (Paediatrics)4,5; Agnes FY Tiwari, PhD, FAAN6; Weng Y Chin, FRACGP1; Emily TY Tse, FRACGP, FHKAM (Family Medicine)1; Carlos KH Wong, PhD1,2,7; Vivian Y Guo, PhD8; Cindy LK Lam, MD, FHKAM (Family Medicine)1
    1 Department of Family Medicine and Primary Care, The University of Hong Kong, Hong Kong SAR, China
    2 Department of Pharmacology and Pharmacy, The University of Hong Kong, Hong Kong SAR, China
    3 Department of Special Education and Counselling, The Education University of Hong Kong, Hong Kong SAR, China
    4 Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong SAR, China
    5 Department of Paediatrics and Adolescent Medicine, Hong Kong Children’s Hospital, Hong Kong SAR, China
    6 School of Nursing, Hong Kong Sanatorium & Hospital, Hong Kong SAR, China
    7 Laboratory of Data Discovery for Health Limited, Hong Kong Science Park, Hong Kong SAR, China
    8 Department of Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
     
    Corresponding author: Dr Eric YF Wan (yfwan@hku.hk)
     
     Full paper in PDF
     
    Abstract
    Introduction: Low-income families face increased exposure to stressors, including material hardship and limited social support, which contribute to poor health outcomes. The poor health and behavioural problems in children from these families may exacerbate parental stress. This study explored the bidirectional relationship between parental stress and child health, along with its mediators and moderators, among low-income families in Hong Kong.
     
    Methods: In total, 217 families were recruited from two less affluent communities between 2016 and 2017; they were followed up at 12 and 24 months. Each parent-child pair was assessed using parent-completed questionnaires on socio-demographics, medical history, parental stress, health-related quality of life, child health and behaviour, family harmony, parenting style, and neighbourhood cohesion.
     
    Results: Thirty-eight parents (17.5%) reported significantly higher levels of stress than the control group. These individuals were more likely to be single parents (41.2% vs 18.5%), victims of intimate partner abuse (23.7% vs 10.9%), have a household income below 50% of the Hong Kong population median (50.0% vs 29.9%), and be diagnosed with mental illnesses (23.7% vs 5.1%). A bidirectional inverse relationship was observed between parental stress and child health at respective time points, with cross-effects from baseline child health to later parental stress, and from baseline parental stress to later child health. The relationship was mediated by the level of parental depression.
     
    Conclusion: Parental stress both precedes and results from child health and behavioural problems, with reciprocal short-term and long-term effects. Screening and intervention for parental depression are needed to mitigate the impacts of stress on health among parents and children.
     
     
    New knowledge added by this study
    • Single parents, victims of intimate partner abuse, individuals with mental illnesses, and/or those living in poverty reported significantly higher levels of stress compared to other low-income parents in Hong Kong.
    • A bidirectional inverse relationship was observed between general parental stress and child health over a 24-month period among low-income families in Hong Kong.
    • Parental depression mediated the relationship between parental stress and child health.
    Implications for clinical practice or policy
    • Active screening for parental depression among at-risk parents in low-income communities is urgently needed to enable early intervention and reduce long-term negative impacts on child health.
     
     
    Introduction
    Low-income families face increased exposure to stressors,1 2 such as material hardship, dispossession, limited social support,3 4 trauma, and violence,1 5 which subsequently affect family relationships and the physical and mental health of parents,6 7 8 contributing to household-wide feelings of stigma, isolation, and exclusion. These stressors are particularly relevant to Hong Kong, where approximately one-fifth of the population lives below the poverty line.9 Adults from low-income families in Hong Kong have reported significantly lower health-related quality of life (HRQOL) than age- and sex-matched individuals from the general population; low income is significantly associated with poorer mental health.10
     
    Stressors may persist across the life course and affect the next generation, resulting in intergenerational socio-economic inequality and health disparities. Early caregiving experiences have been linked to later-life child health outcomes through physiological stress responses.11 Moreover, poor mental health in parents may lead to family disharmony and maladaptive parenting practices, which can increase a child’s risk of adverse health outcomes.7 8 Specifically, children of parents with depression tend to exhibit more difficult temperaments and diminished psychosocial functioning.12 13 Children from low-income families in Hong Kong have reported poorer health and more behavioural problems relative to population norms for similar age-groups.14 15 Without adequate parental care and guidance, such children may be more vulnerable to academic difficulties and behavioural problems, thereby exacerbating parental stress. A bidirectional relationship between parental stress and child health has been documented in Western studies6 8 but not within the Chinese context.
     
    Stress coping can be mediated or moderated by various social factors.16 17 For instance, stressed parents may contribute to family disharmony, which mediates diminished child health. Neighbourhood cohesion may moderate this relationship by alleviating parental stress and enhancing children’s well-being. The identification of mediators and moderators that may influence the relationship between parental stress and child health enables development of targeted interventions and policy recommendations. Despite strong associations of parental depression with stress18 and child health,12 13 its mediating role in this relationship remains unclear. A recent study demonstrated mediation between parental stress and parent-infant bonding,19 but evidence concerning overall child health is lacking. This study aimed to explore whether a bidirectional relationship exists between parental stress and child health and to identify its mediators and moderators, with the goal of promoting health among parents and children from low-income families in Hong Kong. We hypothesised that parental stress precedes and results from child health, with mediating and moderating effects exerted by factors illustrated in Figure 1.
     

    Figure 1. Study concept map based on existing knowledge of the associations of parental, child and family factors with parental stress and child health
     
    Methods
    Study design
    This prospective cohort study involved 217 parent-child pairs in which at least one parent was the primary caregiver and at least one parent was employed, with a monthly household income lower than 75% of the Hong Kong median at baseline. This income criterion included working poor families who lived above the poverty line (50% of the population median) and received limited government support. Families were recruited by research staff when attending health assessments during our previous cohort study20 performed in two less affluent Hong Kong communities between May 2016 and October 2017. Parents unable to communicate in Chinese, as well as children born prematurely and/or with congenital deformities, were excluded. All parents provided written informed consent for themselves and their child to participate in the study. Sample size was determined based on the need to detect a difference in Child Health Questionnaire (CHQ) scores between children of parents with high and low stress levels, classified according to the Depression Anxiety Stress Scales (DASS) stress subscale scores. Our previous cohort study showed that average CHQ general health perceptions subscale scores in children of parents with high and low DASS stress subscale scores were 59 (standard deviation [SD]=17) and 65 (SD=16), respectively20 (effect size=0.4). A sample size of 200 (100 per group) parent-child pairs was required to detect a difference of 6 points in CHQ general health perceptions subscale score between groups using an independent t test with 80% power and a 5% level of significance.
     
    Data collection
    Each parent-child pair was invited to complete a comprehensive questionnaire survey at three time points (ie, baseline, 12 months, and 24 months) covering parental stress, HRQOL, and mental health; child’s general health, HRQOL, and behaviour; family harmony; parenting style; and neighbourhood cohesion, as reported by the parent. Potential confounders were recorded at baseline, including parental age, gender, education level, marital status, employment status, household income, smoking habits, and alcohol consumption, as well as the child’s age, gender, estimated intelligence quotient, and special education needs. Physical and mental co-morbidities in parents and children were recorded at all three time points.
     
    Study instruments
    Exposure
    Parental stress was measured using the stress subscale of the DASS–21 items questionnaire.21 A cut-off score of ≥15 indicated the presence of significant parental stress.21 The scale has been validated in a Chinese population.22
     
    Primary outcome
    Child health was measured using the general health perceptions subscale score from the CHQ–Parent Form 50.23 A higher score indicates better perceived physical and psychological HRQOL in the child based on parental proxy report. The Chinese version has demonstrated good psychometric properties in local Chinese children.20
     
    Potential mediators/moderators
    The Patient Health Questionnaire–9 (PHQ-9)24 was used to screen for parental depression. A cut-off score of ≥10 was regarded as clinically significant depression. The Chinese version of the PHQ-9 was validated and used in our previous study.20 Family harmony was measured using the Family Harmony Scale–Short Form (FHS-5).25 Higher single-factor harmony scores reflect greater harmony. The Chinese version has demonstrated good psychometric properties in local Chinese households.25 Parent-child interaction was assessed using the Child Physical Assault and Neglect subscales of the Parent–Child Conflict Tactics Scale (CTSPC).26 Higher scores indicate higher frequencies of respective issues in the past 12 months. The translated traditional Chinese version has demonstrated good psychometric properties.27 Parenting style was assessed using the Authoritative Parenting Style subscale of the short version of the Parenting Style and Dimensions Questionnaire.28 A higher score indicates a stronger tendency towards authoritative parenting. The questionnaire has been validated in the Chinese cultural setting.29 Neighbourhood support was measured using the Neighbourhood Collective Efficacy Scale.30 Higher scores indicate greater neighbourhood cohesion. The scale has been tested in Chinese in a local study.31
     
    Data analysis
    Baseline characteristics of parent-child pairs and their households were summarised using descriptive statistics. Differences between groups according to parental stress level were assessed using independent t tests for continuous variables and the Chi squared test for categorical variables.
     
    The longitudinal bidirectional relationship between parental stress and child health was assessed using a cross-lagged panel model. Multiple indicators were utilised to evaluate model goodness-of-fit. A statistically non-significant Chi squared P value, Comparative Fit Index and Tucker-Lewis Index >0.95, root mean square error of approximation ≤0.05, and standardised root mean residual >0.08 were considered indicative of desirable goodness-of-fit. The final model was selected using root mean square error of approximation–based forward stepwise selection.
     
    A mediation model was used to evaluate candidate mediators. Model estimates were obtained using 5000 bootstrapping samples. A statistically significant indirect effect, along with a reduced direct effect magnitude relative to the total effect, indicated that a given mediator explained the relationship between parental stress and child health.32 A multi-mediator model was constructed; differences in indirect effects between mediators were estimated via pairwise comparison.
     
    Potential moderating effects of neighbourhood cohesion and parenting style on the relationship between parental stress and child health were examined by multivariable linear regression. A statistically significant interaction term coefficient indicated a moderation effect. All variables were centred to a mean of zero to reduce multicollinearity related to interaction terms. Confounders were included to improve model goodness-of-fit; R2 and adjusted R2 values were used to evaluate model performance.
     
    All descriptive analyses were performed using Stata 16 (StataCorp LLC, College Station [TX], US); all model analyses were carried out using the lavaan package33 version 0.6-6, in R version 4.0.1 (R Foundation for Statistical Computing, Vienna, Austria). Data completion rates are presented in online supplementary Table 1. Complete case analyses were conducted. All tests were two-tailed; P values <0.05 were considered statistically significant.
     
    Results
    Among the 217 parent-child pairs recruited at baseline, 175 (80.6%) and 184 (87.6%) pairs attended the 12-month and 24-month follow-ups, respectively (online supplementary Fig 1). Their characteristics at each of the three time points are detailed in Table 1.
     

    Table 1. Socio-demographics, co-morbidities, and outcome measures
     
    Baseline characteristics of parent-child pairs
    At baseline, the ages of parents and children (mean ± SD) were 42.4 ± 6.2 years and 10.7 ± 2.0 years, respectively. Approximately half of the children were girls (47.5%), whereas the parents involved were predominantly mothers (91.7%). The majority (75.2%) of parents had completed secondary education. Approximately 39.8% of primary parents were employed, and 57.2% of families had a monthly household income below 75% of the 2016 Hong Kong median (ie, HK$25 000).34
     
    Thirty-eight parents (17.5%) experienced significant stress, indicated by a DASS stress subscale score of 15 or above at baseline. Considerable differences were evident in their baseline characteristics compared with parents who were not stressed. Stressed parents were more likely to be single parents (41.2% vs 18.5%) and to have a household income below 50% of the Hong Kong median (50.0% vs 29.9%). A greater proportion of stressed parents reported being victims of intimate partner abuse (23.7% vs 10.9%). Diagnosed mental illnesses (23.7% vs 5.1%) and depression, indicated by a PHQ-9 score ≥10 (21.1% vs 2.4%), were more prevalent among these parents (Table 2). Both their physical and mental HRQOL were significantly worse (physical component score=42.5 ± 9.9 vs 49.1 ± 8.2; mental component score=38.1 ± 10.0 vs 55.5 ± 8.7; P<0.001).
     

    Table 2. Baseline characteristics stratified by parental stress group (n=217)
     
    Compared with children of parents who were not stressed, children of stressed parents were younger (age=10.0 ± 2.6 years vs 10.8 ± 1.8 years; P=0.020) and had worse general health and HRQOL, as reflected by lower scores in every subscale of the CHQ–Parent Form 50 except bodily pain and self-esteem. In particular, large differences were observed in four subscales: parental impact—emotional, parental impact—time, family activities, and family cohesion.
     
    Moreover, stressed parents reported lower scores in family harmony (FHS-5) and neighbourhood cohesion (Neighbourhood Collective Efficacy Scale). Although parenting style did not differ significantly, stressed parents showed a greater tendency for physical punishment, as reflected by higher scores on the CTSPC–physical assault subscale, and for neglect, as indicated by higher CTSPC–neglect subscale scores, compared with parents who were not stressed (Table 2).
     
    Relationship between parental stress and child health over time
    Figure 2 shows the cross-lagged panel model examining the bidirectional relationship between parental stress and child health. A bidirectional relationship between child health and parental stress was confirmed. Significant associations were observed between parental stress and child health at each time point (estimates: baseline=-0.22, 12 months=-0.21, 24 months=-0.47); between baseline child health and parental stress at 12 months (estimate=-0.40) and 24 months (estimate=-0.42); and between baseline parental stress and child health at 12 months (estimate=-0.57) and 24 months (estimate=-0.10).
     

    Figure 2. Cross-lagged panel model between parental stress and child health
     
    Mediators and moderators of the parent-child health relationship over time
    The multi-mediation model results generated by bootstrapping are illustrated in Figure 3; the model estimates and goodness-of-fit statistics are presented in online supplementary Table 2. The total effect of the relationship between parental stress and child health was reduced when mediators were included in the model. Significant positive associations of parental stress were observed with the PHQ-9 score, as well as the physical assault and neglect subscales of the CTSPC. A significant negative association was noted between parental stress and the FHS-5 score. Among mediators, only the PHQ-9 exerted a significant negative effect on child health.
     

    Figure 3. Multi-mediation model between parental stress and child health
     
    Table 3 presents the moderation model. Neither neighbourhood cohesion nor parenting style demonstrated a moderating effect on the relationship between parental stress and child health. Estimates for the interaction terms were negligible. The R2 values were around 0.21, and the adjusted R2 values were slightly lower (0.11-0.13), indicating modest explanatory power of the model after adjusting for confounders.
     

    Table 3. Moderation effects of the relationship between parental stress and child health
     
    Discussion
    Our study demonstrated that a substantial proportion of low-income parents experienced stress (17.5%), which was associated with multiple stressors including poverty, marital problems, intimate partner abuse, family disharmony, and reduced neighbourhood support. Children of stressed parents reported worse general health and HRQOL, as well as more behavioural problems. A short-term and long-term bidirectional inverse relationship between parental stress and child health was confirmed; this relationship was partially mediated by the level of parental depression.
     
    Compared with the general Hong Kong population, the parent-child pairs in this study were more exposed to various known stressors in addition to low income. The prevalences of single-parent families (22.3% vs 9.8%35) and intimate partner abuse (13.2% vs 7.2%36) were higher, and more parents reported regular alcohol consumption (17.4% vs 8.7%37). Therefore, it is not surprising that a considerably greater proportion of parents in this study experienced elevated levels of stress (17.5% vs 5.2%38) and depression (5.9% vs 1.2%37). The persistently high level of parental stress observed during the study period may be attributed partly to ongoing exposure to various stressors over time and partly to constant exposure to chronic stressors. Both scenarios highlight the urgent need to ensure assessment and intervention for these disadvantaged parents.
     
    Previous studies have demonstrated bidirectional interactions between parental stress and child health in relation to both internalising and externalising behaviours.6 8 Increases in behavioural problems have been shown to raise parental stress over time, which in turn exacerbates behavioural issues in children.39 Our study adds to this body of evidence by confirming significant bidirectional effects between general parental stress and child health at each time point. Cross-effects were observed from baseline child health to later parental stress, and from baseline parental stress to later child health at both 12 and 24 months. These findings suggest that parental stress both precedes and results from child health, with reciprocal short-term and long-term influences.40
     
    In our attempt to identify pathways through which parental stress affects child health, we observed that only parental depression significantly mediated the relationship. This result is consistent with previous findings that maternal depression and perceived stress directly and negatively influence child development.41 One possible explanation is that depressed mothers may lack the energy or capacity to provide adequate care and support for their child’s health. Research into this mediation effect remains limited; however, one recent study reported similar outcomes regarding the indirect impact of workrelated stress on child health, mediated by maternal depression.42 The implementation of screening and intervention for parental depression is both imperative and urgent to counteract the adverse effects of stress on parental and child health. Medical and social service providers should collaborate to actively screen at-risk parents from low-income families in the community. Early intervention through lifestyle-based care—such as physical activity, relaxation techniques, and mindfulness-based therapies—can help to prevent43 44 and manage45 46 depression, thus mitigating long-term negative impacts on child health.
     
    However, it must be noted that parents with depression may be biased towards over-reporting their child’s problems,47 compared with other informants such as teachers and the children themselves.48 Further research is warranted to identify individual and family characteristics that may influence discrepancies between informants. Other potential factors examined in previous studies—such as household structure (dual- vs multi-generational), parental rearing behaviours, and confident and affective social support—might also contribute to the relationship between parental stress and child health; they should be explored in future studies with larger sample sizes.
     
    Strengths and limitations
    This is one of the first studies to examine the longitudinal relationship between general parental stress and child health, enabling assessment of possible causal relationships between the two outcomes. Specifically, we recruited vulnerable families with substantial socio-economic disadvantages who experience high levels of stress and would benefit most from future interventions. Furthermore, a high response rate was maintained throughout the study, ensuring adequate power for the analyses.
     
    However, the findings of our study must be interpreted in light of the following limitations. First, although we conducted a comprehensive analysis of factors related to parental stress and child health, the outcomes were based on self-reported assessments, which are susceptible to respondent bias. Only three measurements, taken 1 year apart, were performed in this study due to concerns regarding practicality and the burden on participating families. Therefore, caution should be exercised in generalising the results with respect to longitudinal trends, given that substantial intra-individual fluctuations may have occurred but were not captured in this study. Second, both parental stress and child health were assessed using parent-report questionnaires, which may contribute to increased shared method variance. Additionally, aspects of the child’s health or behaviour considered problematic by the parent may not align with assessments made by other individuals (eg, teachers). As mentioned earlier, parents with depression may be biased towards over-reporting problems and are more likely to report behavioural issues in their child compared with other informants.47 48 The validity of parent-perceived measures of child health—particularly in relation to parental depression—and their agreement with other caregivers should be examined in future trials specifically designed for this purpose. Third, there were unmeasured confounders in this observational study, such as exercise and social functioning. Moreover, certain socio-demographic factors, including marital and employment statuses, were assumed to be static throughout the study. It remains uncertain whether changes in these factors, if any, may have influenced the observed results. Additional information regarding participant characteristics, observational measures of child behaviour, or objective indicators of child health (eg, cortisol levels) could improve the reliability of the findings.
     
    Conclusion
    This study showed that a substantial proportion of parents from low-income families in Hong Kong experienced general stress due to multiple stressors, which was negatively associated with their child’s health. A bidirectional relationship was observed between parental stress and child health over time, which may be partly mediated by parental depression. Prompt screening and appropriate intervention are necessary to prevent adverse health outcomes for parents and children in low-income families.
     
    Author contributions
    Concept or design: EYT Yu, RSM Wong, AFY Tiwari, CKH Wong, VY Guo, CLK Lam.
    Acquisition of data: RSM Wong, KSN Liu.
    Analysis or interpretation of data: EYT Yu, EYF Wan, RSM Wong, IL Mak, AFY Tiwari, CKH Wong, VY Guo, CLK Lam.
    Drafting of the manuscript: EYT Yu, RSM Wong, IL Mak, CHN Yeung.
    Critical revision of the manuscript for important intellectual content: All authors.
     
    All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
     
    Conflicts of interest
    As advisors of the journal, EYT Yu and CKH Wong were not involved in the peer review process. Other authors have disclosed no conflicts of interest.
     
    Acknowledgement
    The authors are grateful for the support from Kerry Group Kuok Foundation (Hong Kong) Limited in conducting this study on participants of the Trekkers Family Enhancement Scheme. The authors’ sincere gratitude goes to the Neighbourhood Advice-Action Council, Hong Kong Sheng Kung Hui Lady MacLehose Centre, and Shek Lei Community Hall for their assistance in participant recruitment and provision of venues for data collection, respectively. The authors thank the Social Science Research Centre of The University of Hong Kong (HKU) for their timely completion of the telephone surveys, and Department of Paediatrics and Adolescent Medicine of HKU for performing the assays for DNA extraction and telomere length measurement. The authors also thank the hard work of their research staff in data collection and analysis.
     
    Declaration
    The study results were disseminated through a poster presentation at the Health Research Symposium 2021 (23 November 2021, hybrid conference), entitled “In-depth exploration of a bidirectional parent-child health relationship and its mediating and moderating factors among low-income families in Hong Kong”.
     
    Funding/support
    This research was supported by the Health and Medical Research Fund of the Health Bureau, Hong Kong SAR Government (Ref No.: HMRF 14151571). The funder had no role in the study design, data collection/analysis/interpretation, or manuscript preparation.
     
    Ethics approval
    This research was approved by the Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster, Hong Kong (Ref No.: UW 16-415). Informed consent was obtained from patients when baseline data were collected.
     
    Supplementary material
    The supplementary material was provided by the authors and some information may not have been peer reviewed. Accepted supplementary material will be published as submitted by the authors, without any editing or formatting. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by the Hong Kong Academy of Medicine and the Hong Kong Medical Association. The Hong Kong Academy of Medicine and the Hong Kong Medical Association disclaim all liability and responsibility arising from any reliance placed on the content.
     
    References
    1. Santiago CD, Kaltman S, Miranda J. Poverty and mental health: how do low-income adults and children fare in psychotherapy? J Clin Psychol 2013;69:115-26. Crossref
    2. Smith MV, Mazure CM. Mental health and wealth: depression, gender, poverty, and parenting. Annu Rev Clin Psychol 2021;17:181-205. Crossref
    3. Evans GW, Kim P. Childhood poverty, chronic stress, self-regulation, and coping. Child Dev Perspect 2013;7:43-8. Crossref
    4. Adjei NK, Jonsson KR, Straatmann VS, et al. Impact of poverty and adversity on perceived family support in adolescence: findings from the UK Millennium Cohort Study. Eur Child Adolesc Psychiatry 2024;33:3123-32. Crossref
    5. Alto ME, Warmingham JM, Handley ED, Manly JT, Cicchetti D, Toth SL. The association between patterns of trauma exposure, family dysfunction, and psychopathology among adolescent females with depressive symptoms from low-income contexts. Child Maltreat 2023;28:130-40. Crossref
    6. van Dijk W, de Moor MH, Oosterman M, Huizink AC, Matvienko-Sikar K. Longitudinal relations between parenting stress and child internalizing and externalizing behaviors: testing within-person changes, bidirectionality and mediating mechanisms. Front Behav Neurosci 2022;16:942363. Crossref
    7. Neece CL, Green SA, Baker BL. Parenting stress and child behavior problems: a transactional relationship across time. Am J Intellect Dev Disabil 2012;117:48-66. Crossref
    8. Stone LL, Mares SH, Otten R, Engels RC, Janssens JM. The co-development of parenting stress and childhood internalizing and externalizing problems. J Psychopathol Behav Assess 2016;38:76-86. Crossref
    9. Economic Analysis Division Economic Analysis and Business Facilitation Unit Financial Secretary’s Office; Census and Statistics Department, Hong Kong SAR Government. Hong Kong Poverty Situation Report 2013. Oct 2014. Available from: https://www.commissiononpoverty.gov.hk/eng/pdf/poverty_report13_rev2.pdf. Accessed 31 Jul 2023.
    10. Lam CL, Guo VY, Wong CK, Yu EY, Fung CS. Poverty and health-related quality of life of people living in Hong Kong: comparison of individuals from low-income families and the general population. J Public Health (Oxf) 2017;39:258-65.Crossref
    11. Luecken LJ, Lemery KS. Early caregiving and physiological stress responses. Clin Psychol Rev 2004;24:171-91. Crossref
    12. Hanington L, Ramchandani P, Stein A. Parental depression and child temperament: assessing child to parent effects in a longitudinal population study. Infant Behav Dev 2010;33:88-95. Crossref
    13. Associations between depression in parents and parenting, child health, and child psychological functioning. In: England MJ, Sim LJ, editors. Depression in Parents, Parenting, and Children: Opportunities to Improve Identification, Treatment, and Prevention. Washington (DC): National Academies Press (US); 2009: 119-82.
    14. Lee SL, Cheung YF, Wong HS, Leung TH, Lam T, Lau YL. Chronic health problems and health-related quality of life in Chinese children and adolescents: a population-based study in Hong Kong. BMJ Open 2013;3:e001183. Crossref
    15. Chan KL, Lo CK, Ho FK, Chen Q, Chen M, Ip P. Modifiable factors for the trajectory of health-related quality of life among youth growing up in poverty: a prospective cohort study. Int J Environ Res Public Health 2021;18:9221. Crossref
    16. Asok A, Bernard K, Roth TL, Rosen JB, Dozier M. Parental responsiveness moderates the association between early-life stress and reduced telomere length. Dev Psychopathol 2013;25:577-85. Crossref
    17. Evans GW, Kim P, Ting AH, Tesher HB, Shannis D. Cumulative risk, maternal responsiveness, and allostatic load among young adolescents. Dev Psychol 2007;43:341-51. Crossref
    18. Hammen C. Stress and depression. Annu Rev Clin Psychol 2005;1:293-319. Crossref
    19. Power C, Weise V, Mack JT, Karl M, Garthus-Niegel S. Does parental mental health mediate the association between parents’ perceived stress and parent-infant bonding during the early COVID-19 pandemic? Early Hum Dev 2024;189:105931. Crossref
    20. Fung CS, Yu EY, Guo VY, et al. Development of a Health Empowerment Programme to improve the health of working poor families: protocol for a prospective cohort study in Hong Kong. BMJ Open 2016;6:e010015. Crossref
    21. Lovibond SH, Lovibond PF; Psychology Foundation of Australia. Manual for the Depression Anxiety Stress Scales. Sydney: Sydney Psychology Foundation; 1995. Crossref
    22. Wang K, Shi HS, Geng FL, et al. Cross-cultural validation of the Depression Anxiety Stress Scale–21 in China. Psychol Assess 2016;28:e88-100. Crossref
    23. Landgraf JM. Child Health Questionnaire (CHQ). In: Maggino F, editor. Encyclopedia of Quality of Life and Well-being Research. Cham: Springer; 2020: 1-6. Crossref
    24. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001;16:606-13. Crossref
    25. Kavikondala S, Stewart SM, Ni MY, et al. Structure and validity of Family Harmony Scale: an instrument for measuring harmony. Psychol Assess 2016;28:307-18. Crossref
    26. Straus MA. Measuring intrafamily conflict and violence: the Conflict Tactics (CT) Scales. J Marriage Fam 1979;41:75-88. Crossref
    27. Chan KL, Brownridge DA, Fong DY, Tiwari A, Leung WC, Ho PC. Violence against pregnant women can increase the risk of child abuse: a longitudinal study. Child Abuse Negl 2012;36:275-84. Crossref
    28. Robinson CC, Mandleco B, Olsen SF, Hart CH. The Parenting Styles and Dimensions Questionnaire (PSDQ). In: Perlmutter BF, Touliatos J, Holden GW, editors. Handbook of Family Measurement Techniques: Vol 3. Instruments & Index. Thousand Oaks: Sage; 2001: 319-21.
    29. Wu P, Robinson CC, Yang C, et al. Similarities and differences in mothers’ parenting of preschoolers in China and the United States. Int J Behav Dev 2002;26:481-91. Crossref
    30. Sampson RJ, Raudenbush SW, Earls F. Neighborhoods and violent crime: a multilevel study of collective efficacy. Science 1997;277:918-24. Crossref
    31. Chou KL. Perceived discrimination and depression among new migrants to Hong Kong: the moderating role of social support and neighborhood collective efficacy. J Affect Disord 2012;138:63-70. Crossref
    32. Baron RM, Kenny DA. The moderator–mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol 1986;51:1173-82. Crossref
    33. Rosseel Y. lavaan: an R package for structural equation modeling. J Stat Softw 2012;48:1-36. Crossref
    34. Census and Statistics Department, Hong Kong SAR Government. Hong Kong 2016 Population By-census–Thematic Report: Household Income Distribution in Hong Kong. Jun 2017. Available from: https://www.censtatd.gov.hk/en/data/stat_report/product/B1120096/att/B11200962016XXXXB0100.pdf. Accessed 25 Aug 2025.
    35. Census and Statistics Department, Hong Kong SAR Government. 2021 Population Census—Thematic Report: Children. Feb 2023. Available from: https://www.census2021.gov.hk/doc/pub/21c-Children.pdf. Accessed 25 Aug 2025.
    36. Chan KL. Intimate partner violence in Hong Kong. In: Chan KL, editor. Preventing Family Violence: A Multidisciplinary Approach. Hong Kong: Hong Kong University Press; 2012: 19-58. Crossref
    37. Non-Communicable Disease Branch, Centre for Health Protection, Hong Kong SAR Government. Report of Population Health Survey 2020-22 (Part I); 2022. Available from: https://www.chp.gov.hk/files/pdf/dh_phs_2020-22_part_1_report_eng_rectified.pdf. Accessed 31 Jul 2023.
    38. Chan SM, Wong H, Chung RY, Au-Yeung TC. Association of living density with anxiety and stress: a cross-sectional population study in Hong Kong. Health Soc Care Community 2021;29:1019-29. Crossref
    39. Baker BL, McIntyre LL, Blacher J, Crnic K, Edelbrock C, Low C. Pre-school children with and without developmental delay: behaviour problems and parenting stress over time. J Intellect Disabil Res 2003;47:217-30. Crossref
    40. Motrico E, Bina R, Kassianos AP, et al. Effectiveness of interventions to prevent perinatal depression: an umbrella review of systematic reviews and meta-analysis. Gen Hosp Psychiatry 2023;82:47-61. Crossref
    41. Vameghi R, Amir Ali Akbari S, Sajedi F, Sajjadi H, Alavi Majd H. Path analysis association between domestic violence, anxiety, depression and perceived stress in mothers and children’s development. Iran J Child Neurol 2016;10:36-48. Crossref
    42. Xu L, Xu J. The impact of maternal occupation on children’s health: a mediation analysis using the parametric G-formula. Soc Sci Med 2024;343:116602. Crossref
    43. Bellón JÁ, Conejo-Cerón S, Sánchez-Calderón A, et al. Effectiveness of exercise-based interventions in reducing depressive symptoms in people without clinical depression: systematic review and meta-analysis of randomised controlled trials. Br J Psychiatry 2021;219:578-87. Crossref
    44. Newland P, Bettencourt BA. Effectiveness of mindfulness-based art therapy for symptoms of anxiety, depression, and fatigue: a systematic review and meta-analysis. Complement Ther Clin Pract 2020;41:101246. Crossref
    45. Marx W, Manger SH, Blencowe M, et al. Clinical guidelines for the use of lifestyle-based mental health care in major depressive disorder: World Federation of Societies for Biological Psychiatry (WFSBP) and Australasian Society of Lifestyle Medicine (ASLM) taskforce. World J Biol Psychiatry 2023;24:333-86. Crossref
    46. Recchia F, Leung CK, Chin EC, et al. Comparative effectiveness of exercise, antidepressants and their combination in treating non-severe depression: a systematic review and network meta-analysis of randomised controlled trials. Br J Sports Med 2022;56:1375-80. Crossref
    47. Chi TC, Hinshaw SP. Mother–child relationships of children with ADHD: the role of maternal depressive symptoms and depression-related distortions. J Abnor Child Psychol 2002;30:387-400. Crossref
    48. Richters JE. Depressed mothers as informants about their children: a critical review of the evidence for distortion. Psychol Bull 1992;112:485-99. Crossref

    Clinical and imaging patterns of child abuse in Hong Kong: a 10-year review from a tertiary centre

    Hong Kong Med J 2025 Oct;31(5):347–54 | Epub 19 Sep 2025
    © Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
     
    ORIGINAL ARTICLE  CME
    Clinical and imaging patterns of child abuse in Hong Kong: a 10-year review from a tertiary centre
    Catherine YM Young, MB, BS, FRCR1; CH Yiu1; Kathleen CH Tsoi, MB, ChB, MRCPCH2; Dorothy FY Chan, MB, ChB, FRCPCH2; Ki Wang, MB, BS, FRCR1; Winnie CW Chu, MB, ChB, MD1
    1 Department of Imaging and Interventional Radiology, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, China
    2 Department of Paediatrics, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, China
     
    Corresponding author: Dr Catherine YM Young (youngymc@connect.hku.hk)
     
     Full paper in PDF
     
    Abstract
    Introduction: Child abuse, a pressing medical and social issue in Hong Kong, requires high vigilance for prompt identification and early management. The Mandatory Reporting of Child Abuse Ordinance has recently been gazetted, establishing a mandatory obligation for suspected injury reporting to protect children’s rights. This study aimed to describe the incidence and patterns of child abuse in Hong Kong to draw attention to this key issue.
     
    Methods: A retrospective review of all reported child abuse cases admitted to Prince of Wales Hospital over a 10-year period (2014-2023) was performed.
     
    Results: In total, 503 cases of child abuse were retrieved from the hospital’s electronic system, revealing an increasing trend over the years. Of these cases, 341 cases (67.8%) were attributed to physical abuse. Most cases involved trivial soft tissue injuries, apart from two limb fracture cases, which represented 0.4% of all reported child abuse cases (n=503) and 0.6% of all reported physical child abuse cases (n=341). Abusive head trauma (n=3) constituted 0.6% of all reported physical child abuse cases and 0.9% of all reported child abuse cases. Two cases of severe abusive head trauma required paediatric intensive care, and one case warranting neurosurgical intervention subsequently exhibited gross motor delay.
     
    Conclusion: Most child abuse cases in Hong Kong present with minor clinical manifestations. Imaging evidence of skeletal or neurological injury is present in a small proportion of patients. Abusive head injury is uncommon but carries far-reaching consequences; early recognition is essential to protect affected children from further harm. Paediatric radiologists play a pivotal role in making the diagnosis.
     
     
    New knowledge added by this study
    • Fractures resulting from non-accidental injury are less common in Hong Kong, which has a predominantly Chinese population, than in Western countries; the fracture patterns differ.
    • The overall incidence of abusive head trauma is low; however, a substantial proportion of patients with non-accidental injury who undergo further neuroimaging display positive findings.
    Implications for clinical practice or policy
    • Interpretation of plain radiographs in cases of non-accidental injury should not solely rely on classical textbook fracture patterns; correlations with a compatible clinical history are particularly important.
    • Neuroimaging is essential for children under 1 year of age with clinical suspicion of non-accidental injury, particularly those showing abnormal neurological signs, to detect abusive head trauma.
     
     
    Introduction
    Child abuse is a prevalent yet frequently overlooked condition in paediatric patients worldwide, affecting between 4% and 16% of the paediatric population.1 It may manifest as physical abuse, neglect, sexual abuse, or psychological abuse,2 all of which carry substantial long-term medical and psychological consequences. Clinical presentation is often vague, requiring a high degree of clinical suspicion by both clinicians and radiologists to ensure early activation of child protection services. Multidisciplinary input is needed for timely intervention and prevention of recurrence.
     
    While clinical evaluation is crucial for identifying apparent or superficial injuries, radiological imaging also plays a vital role in detecting old or clinically occult injuries. John Caffey, a paediatric radiologist, was among the first to describe the association between long bone fractures and chronic subdural haematoma in infants, introducing the concept of non-accidental injury.3 Since then, a growing body of literature has emerged concerning the radiological features of non-accidental injury, contributing to increased global awareness. Various guidelines have also been developed, including those by The Royal College of Radiologists4 and the American College of Radiology,5 which recommend appropriate imaging modalities in suspected cases to protect children’s welfare while balancing the risks of radiation exposure.
     
    Various retrospective studies in Western populations have examined the epidemiology, injury patterns, and outcomes of non-accidental paediatric injuries in their respective regions6 7 8 9; however, limited research has been conducted in Asia, particularly within Hong Kong. This study aimed to describe the incidence, clinical presentation, imaging features, and treatment outcomes of child abuse in a tertiary regional hospital in Hong Kong, with the goal of raising awareness towards this commonly overlooked condition.
     
    Methods
    This retrospective study included all reported cases of child abuse involving paediatric patients (aged 0-18 years) admitted to Prince of Wales Hospital, a tertiary regional hospital in Hong Kong, over a 10-year period (from January 2014 to December 2023). All suspected or confirmed cases of child abuse were identified from the Clinical Data Analysis and Reporting System, an electronic health registry managed by the Hospital Authority of Hong Kong. The search utilised key terms under the International Classification of Diseases, Ninth Revision coding, including “Child maltreatment syndrome”, “Child and adult battering and other maltreatment”, “Child abuse”, and “Child maltreatment syndrome, shaken infant syndrome”. Clinical records of all reported cases were reviewed. Cases were excluded if they were inappropriately categorised (aged >18 years), erroneously reported as unrelated to child abuse, or duplicate entries of the same episode (Fig 1).
     

    Figure 1. Patient recruitment
     
    Clinical data including patient demographics (age at presentation and sex), clinical presentation, type of abuse, imaging performed, multidisciplinary case conferences (MDCCs) held, management strategies, and any long-term adverse outcomes were reviewed from electronic patient records and case notes. Relevant imaging studies were reviewed by the primary investigator (5 years of radiology experience) and cross-checked against the original reports. In cases of discrepancy, images were re-interpreted through consensus reading with an experienced paediatric radiologist (20 years of radiology experience).
     
    Results
    Patient demographics and clinical presentation
    In total, 503 reported cases of child abuse were included in the study. The number of reported cases showed an upward trend over the 10-year period, from 23 cases in 2014 to 50 cases in 2023 (Fig 2).
     

    Figure 2. Trend of reported child abuse cases at Prince of Wales Hospital from 2014 to 2023
     
    The case distribution is presented in Table 1. The cohort comprised 265 (52.7%) girls and 238 (47.3%) boys. The mean age was 8.25 years (range, 0-17), with 55 cases (10.9%) involving infants under 1 year of age. Physical abuse was the most common type at presentation, accounting for 341 cases (67.8%). The vast majority (>99%) of patients presented with erythematous marks, bruises, or lacerations. Other presenting symptoms included seizures, loss of consciousness, and vomiting. Sexual abuse was the second most common type (n=87, 17.3%), followed by child neglect (n=75, 14.9%).
     

    Table 1. Distribution of various types of reported child abuse by age and sex (n=503)
     
    More than half of the cases (n=263, 52.3%) were admitted via the Accident and Emergency (A&E) Department. The vast majority of these patients presented directly to our hospital, and only two transferred from adjacent acute hospitals—one involving abusive head trauma requiring neurosurgical intervention, and another with a suspected vaginal tear necessitating input from obstetricians and gynaecologists. Most of these patients (254 cases, 96.6%) were referred due to clinical suspicion of abuse raised by non-offending parents (n=137), social workers (n=78), the patients themselves (n=22), or witnesses (n=17). In the remaining nine cases (3.4%), suspicion was first raised by medical staff either in the Emergency Department/General Outpatient Clinic (n=4) or after admission (n=5). Although medical staff identified a relatively small proportion of these cases, many were severe, including three abusive head trauma cases initially presenting with seizures. In such cases, abuse was only suspected after imaging.
     
    The remaining 240 cases (47.7%) were admitted through other channels, including referral by social workers (n=203), neonatal admission (n=28), abnormalities identified by medical staff during follow-up or screening (n=8), and sibling screening (n=1).
     
    Imaging modalities and findings
    Imaging was performed for 100 patients (19.9%), including 86 cases with skeletal imaging, 24 with neurological imaging, and one with abdominal imaging. Among the 24 patients who underwent neuroimaging, 10 also had skeletal imaging, while 14 received neuroimaging only.
     
    Of the 86 patients who underwent skeletal imaging, 77 had plain radiographs of the targeted region as initial screening, and nine received a complete skeletal survey. Most patients had minor soft tissue injuries. Fractures were identified in two patients: a supracondylar fracture in a 3-year-old boy and a foot fracture in a 13-year-old girl, representing 2.3% of all skeletal imaging cases (Fig 3). Both fractures were detected on dedicated radiographs directed at regions of pain, as indicated by the patients. In another case, initial radiographs in a 13-year-old boy showed no obvious fracture, but magnetic resonance imaging (MRI) for persistent wrist pain subsequently revealed a mild ligamentous sprain.
     

    Figure 3. (a) Anteroposterior plain radiograph of the right elbow showing a linear transverse supracondylar fracture of the right humerus (arrow). (b) Anteroposterior plain radiograph of the left fifth toe showing cortical buckling over the lateral aspect of the shaft of the left fifth metatarsal bone (arrow)
     
    Computed tomography (CT) was the initial imaging modality in 24 cases evaluated for suspected intracranial injury; five cases (20.8%) showed positive findings. Three cases (12.5%) demonstrated alarming features suggestive of shaken baby syndrome on initial brain CT scans, including subdural haemorrhage (n=3) and cerebral oedema (n=1), prompting further evaluation by MRI. Shaken baby syndrome was confirmed in all three cases on MRI, which showed subdural haemorrhage (n=3) and brain parenchymal injuries, including diffuse axonal injury (n=3) and hypoxic-ischaemic injury (n=2) [Fig 4]. These patients, aged between 2 and 7 months, presented with non-specific symptoms such as seizures (n=3), vomiting (n=2), and loss of consciousness (n=1). Fundoscopic examination confirmed multilayered retinal haemorrhages in all three cases, whereas skeletal surveys were unremarkable (Table 2). The remaining two CT-positive cases included one with a scalp haematoma and another with a mildly depressed parietal skull fracture; both lacked intracranial findings.
     

    Figure 4. Representative case of shaken baby syndrome. (a, b) Computed tomography of the brain shows mixed-density subdural haematoma along bilateral cerebral convexities, extending into the interhemispheric space (white arrows in [a]). A large hypodense area with loss of grey-white differentiation in the right parieto-occipital region (black arrows) suggests cerebral oedema or hypoxic-ischaemic injury. (c-h) Magnetic resonance imaging of the brain confirms subdural collections of varying intensities over bilateral cerebral convexities and the interhemispheric space (white arrows in [c] and [d]), as well as a large area of restricted diffusion in the right parieto-occipital lobe (black arrows in [e] to [h]), consistent with hypoxic-ischaemic injury. Restricted diffusion in the splenium of the corpus callosum (white arrowheads in [g] and [h]) indicates diffuse axonal injury. (c) T1-weighted imaging. (d) T2-weighted imaging. (e, g) Diffusion-weighted imaging. (f, h) Apparent diffusion coefficient mapping
     

    Table 2. Clinical presentation, radiological findings, and clinical outcomes of the three cases of shaken baby syndrome
     
    Ultrasound of the abdomen and pelvis was performed in one patient with persistent abdominal pain; no clinically significant solid organ injury was identified.
     
    Multidisciplinary case conference assessment and long-term adverse outcomes
    Overall, 44 cases (8.7%) were dismissed for various reasons, such as cross-border status, family refusal, or discharge against medical advice. Of the remaining 459 cases (91.3%) evaluated by MDCC, documentation was not retrievable from clinical records in 45 cases (8.9%).
     
    Among the 414 cases with available MDCC documentation or conclusions, child abuse was confirmed in 199 cases (48.1%), comprising physical abuse (n=95), child neglect (n=63), and sexual abuse (n=41). Another 84 cases (20.3%) were categorised as high-risk, involving suspected physical abuse (n=81) or sexual abuse (n=3). Child abuse was not established in the remaining 131 cases (31.6%); these were considered to have low or moderate risk of recurrence.
     
    Of the 89 cases in which MDCC was dismissed or notes were unavailable, more than half (n=63, 70.8%) had presented with suspected physical abuse, followed by sexual abuse (n=22, 24.7%) and neglect (n=4, 4.5%). All cases were deemed minor, with no clinically or radiologically significant findings. No specific treatment or long-term follow-up was required.
     
    The majority of cases exhibited minor severity and were managed conservatively without long-term adverse outcomes.
     
    A long arm cast was applied for one patient with a supracondylar fracture, whereas a resting splint was prescribed for another patient with a ligamentous wrist sprain. Both patients recovered uneventfully after short-term follow-up (1 year) by the orthopaedics team, with no residual impact on daily functioning.
     
    Two patients with severe abusive head trauma required admission to the paediatric intensive care unit. One of these patients warranted multiple neurosurgical interventions, including bilateral burr hole drainage and placement of a ventriculoperitoneal shunt. The remaining two cases of abusive head trauma were managed conservatively. At the most recent follow-up, one patient—the most severely affected—demonstrated gross motor delay at 19 months of age. All other patients showed no neurological deficits or developmental delay to date. No mortality was recorded in this cohort.
     
    Repeated admissions for suspected child abuse were identified in 22 cases. Of these, 16 were recurrent, established cases of child abuse. In 14 of these 16 cases, the type of abuse remained consistent across episodes, whereas two cases involved different types of abuse in separate incidents. Four cases were initially classified as established child abuse, but subsequent admissions were considered non-established, with recurrence risk ranging from low to high. Two cases were categorised as non-established child abuse on both occasions but were considered to have moderate or high risk of recurrence.
     
    Discussion
    This retrospective 10-year study documented a significant rise in reported child maltreatment cases, emphasising that child abuse remains an ongoing medical and social concern. This issue persists despite concerted efforts by the government and various organisations to provide social support to new mothers and at-risk families in an effort to prevent child maltreatment.
     
    Types of child abuse
    Physical abuse was the most common type of presentation in our study, consistent with data from the Child Protection Registry10 and similar findings from Singapore.11 The high prevalence of physical abuse in Hong Kong may reflect cultural differences in parenting practices, such that corporal punishment remains more commonly accepted in Chinese households than in Western contexts.12 Over 50% of families in Hong Kong use physical punishment as part of child-rearing.13 In moments of anger or impulsiveness, the line between ineffective parenting and child abuse may easily be crossed.
     
    Pattern of injury and imaging findings
    The majority of cases in our study were considered mild in nature, with no serious long-term consequences after clinical evaluation and appropriate imaging. Fractures were infrequent, comprising 0.4% of all reported child abuse cases and 0.6% of all reported physical child abuse cases. These rates are slightly lower than those reported in previous Asian studies, which revealed fractures in 1% of all reported physical child abuse cases11 and 3.6% to 7% of all reported child abuse cases.14 15 The present rates are substantially lower than the 28% observed in a Western population.6 The fracture detection rate among patients who underwent imaging in our study (2.3%) was also considerably lower than that in Western populations (24%-32%).7 8 Compared with a previous Hong Kong study in 2005,15 our findings suggest a decline in the overall fracture rate despite an overall increase in reported child maltreatment cases, implying a trend towards milder injuries in recent years. This trend may reflect increased societal awareness of the consequences of severe child abuse, potentially leading parents to move away from traditional forms of physical punishment (eg, caning) and towards less injurious methods, such as striking with the hand. Greater awareness may also facilitate earlier detection and reporting, thereby preventing escalation.
     
    No fractures were identified on skeletal surveys in the few cases of confirmed shaken baby syndrome in our cohort. One case of parietal bone fracture was documented—the parietal bone is among the most commonly fractured skull bones, according to current literature.14 16 The other identified fractures—supracondylar and foot fractures—do not reflect the classical abuse-specific fracture types described in the literature, such as posteromedial rib fractures or metaphyseal corner fractures.16 However, these findings align with previous studies in Singapore, where the humerus was the most frequently fractured bone.11 14 Our results also differ from the findings of Fong et al,15 who reported that forearm and rib fractures were most common in Hong Kong. With the exception of rib fractures, the sites noted in our study are not typically associated with non-accidental injury. This highlights potential differences in injury severity and fracture patterns between Asian and Western populations and underscores the importance of maintaining clinical suspicion for non-accidental injury, even in the absence of classical fracture sites or textbook imaging findings.16
     
    Abusive head trauma is the leading cause of morbidity and mortality among children subjected to abuse, with an estimated morbidity rate of up to 80% and a mortality rate ranging from 15% to 30%.17 18 Despite the deceptively low overall occurrence of abusive head trauma in our study (0.6% of all reported physical child abuse cases and 0.9% of all reported child abuse cases), compared with Western counterparts (up to 40%-50%),6 9 it is notable that 20.8% of our imaged cases showed positive findings, and shaken baby syndrome was confirmed in 12.5% via MRI. All confirmed cases involved infants under 1 year of age, whose relatively oedematous brains, immature intracranial vasculature, and poor neck muscle control render them more susceptible to the effects of abusive head trauma.19 It is therefore imperative that neuroimaging be performed for all children under 1 year of age with suspected non-accidental injury, particularly those with abnormal neurological signs, such as seizures or coma.4 Bilateral subdural haemorrhages of varying densities, focal and diffuse brain parenchymal injuries (eg, diffuse axonal injury or cerebral oedema), and multilayered retinal haemorrhages on fundoscopy, as demonstrated in our study, are consistent with cardinal features of abusive head trauma described in the literature.17 20 Our study also revealed more favourable morbidity (33%) and mortality (0%) outcomes compared with current literature reports,2 17 possibly due to the relatively small number of cases.
     
    Current practice in the management of cases of suspected child abuse
    At present, suspected child maltreatment presents to our hospital via two main pathways: attendance at the A&E Department for suspicious injuries, and referral by social workers who observe unusual behaviour or injuries.21 For cases requiring inpatient care, the paediatric team conducts history taking and physical examination, documents findings (including clinical photographs), and manages the injuries.21 Relevant parties—such as social workers, clinical psychologists, and police officers—are informed as necessary.21 Minor cases may be assessed and discharged directly from the A&E Department.21 An MDCC is typically convened within 10 days of presentation, involving doctors, social workers, school personnel, clinical psychologists, and police officers to determine the nature of the incident, assess the risk of future maltreatment, and recommend preventive measures.21
     
    Radiologists play an active role in the multidisciplinary management of child abuse—not only in assessing the full extent of injuries but also in detecting subtle, suspicious findings, alerting the clinical team, and proactively contributing to early intervention and the reduction of long-term adverse outcomes. The reporting of suspicious injuries is currently conducted on a voluntary basis, guided by recommendations from the Social Welfare Department.22 However, the recently gazetted Mandatory Reporting of Child Abuse Ordinance,23 which becomes effective in January 2026, will impose a legal obligation on professionals to report suspected injuries, thereby strengthening safeguards for children.
     
    Strengths and limitations
    To the best of our knowledge, this is the largest retrospective study to investigate the clinical and radiological features of child abuse in a regional hospital in Hong Kong over the past decade. It provides an updated local overview while drawing comparisons with Western data to highlight distinguishing features and emphasise the need for greater attention to this critical issue.
     
    This study had several limitations. First, it was a retrospective analysis based on voluntarily reported cases, and some instances of child abuse may have been under-recognised or underreported by attending clinicians. A small number of cases also lacked accessible MDCC notes or conclusions due to record loss over time. Second, our dataset includes only admitted cases from a single regional hospital, which may have introduced selection bias because minor cases discharged directly from A&E were excluded. The generalisability of our findings is limited, given that the distribution of child maltreatment cases varies substantially across Hong Kong districts. Sha Tin accounted for approximately 6.2% of all reported child maltreatment cases from 2014 to 2023, whereas Yuen Long accounted of 12%.24 Variations in demographic and socio-economic backgrounds across districts may also influence clinical presentation and severity of injuries; further investigation is warranted. Third, despite the large cohort of child abuse cases included in our series, the proportion of positive imaging findings remains relatively small. Larger-scale studies are needed to better characterise local injury patterns. Finally, due to the extended retrospective recruitment period, follow-up durations varied widely—from 15 months in recent cases to 9 years in earlier cases. Consequently, the long-term effects of abusive head trauma may not yet be evident in patients with shorter follow-up, highlighting the need for further longitudinal assessment into later childhood.
     
    Conclusion
    This study provides an updated overview of the clinical and radiological features of child abuse in Hong Kong, revealing patterns that differ from those described in Western literature. Although most cases involved only minor clinical manifestations, a small proportion of patients exhibited positive imaging findings of skeletal or neurological injury, which may carry serious long-term consequences. Radiologists play a critical role in the multidisciplinary management of child abuse, both in flagging suspicious injuries to alert clinicians and in evaluating the full extent of trauma to protect children from further harm.
     
    Author contributions
    Concept or design: CYM Young, WCW Chu.
    Acquisition of data: All authors.
    Analysis or interpretation of data: CYM Young, WCW Chu.
    Drafting of the manuscript: CYM Young.
    Critical revision of the manuscript for important intellectual content: All authors.
     
    All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
     
    Conflicts of interest
    All authors have disclosed no conflicts of interest.
     
    Funding/support
    This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
     
    Ethics approval
    This research was conducted in accordance with the Declaration of Helsinki. Ethics approval was obtained from the Joint Chinese University of Hong Kong–New Territories East Cluster Clinical Research Ethics Committee, Hong Kong (Ref No.: 2024.071). The requirement for informed patient consent was waived by the Committee due to the retrospective design of the research.
     
    References
    1. Gilbert R, Widom CS, Browne K, Fergusson D, Webb E, Janson S. Burden and consequences of child maltreatment in high-income countries. Lancet 2009;373:68-81. Crossref
    2. Guastaferro K, Shipe SL. Child maltreatment types by age: implications for prevention. Int J Environ Res Public Health 2023;21:20. Crossref
    3. Caffey J. Multiple fractures in the long bones of infants suffering from chronic subdural hematoma. Am J Roentgenol Radium Ther 1946;56:163-73.
    4. The Society and College of Radiographers; The Royal College of Radiologists. The Radiological Investigation of Suspected Physical Abuse in Children (Revised First Edition). London: The Royal College of Radiologists; 2018. Available from: https://www.rcr.ac.uk/media/nznl1mv4/rcr-publications_the-radiological-investigation-of-suspected-physical-abuse-in-children-revised-first-edition_november-2018.pdf. Accessed 1 Oct 2024.
    5. Wootton-Gorges SL, Soares BP, Alazraki AL, et al. ACR Appropriateness Criteria® suspected physical abuse—child. J Am Coll Radiol 2017;14:S338-49. Crossref
    6. Ward A, Iocono JA, Brown S, Ashley P, Draus JM Jr. Non-accidental trauma injury patterns and outcomes: a single institutional experience. Am Surg 2015;81:835-8. Crossref
    7. Day F, Clegg S, McPhillips M, Mok J. A retrospective case series of skeletal surveys in children with suspected non-accidental injury. J Clin Forensic Med 2006;13:55-9. Crossref
    8. Loos MH, Ahmed T, Bakx R, van Rijn RR. Prevalence and distribution of occult fractures on skeletal surveys in children with suspected non-accidental trauma imaged or reviewed in a tertiary Dutch hospital. Pediatr Surg Int 2020;36:1009-17. Crossref
    9. Rosenfeld EH, Johnson B, Wesson DE, Shah SR, Vogel AM, Naik-Mathuria B. Understanding non-accidental trauma in the United States: a national trauma databank study. J Pediatr Surg 2020;55:693-7. Crossref
    10. Social Welfare Department, Hong Kong SAR Government. Child Protection Registry Statistical Report 2023. 2024. Available from: https://www.swd.gov.hk/storage/asset/section/654/Annual%20CPR%20Report%202023_Biligual_Final.pdf. Accessed 1 Oct 2024.
    11. Chew YR, Cheng MH, Goh MC, Shen L, Wong PC, Ganapathy S. Five-year review of patients presenting with non-accidental injury to a children’s emergency unit in Singapore. Ann Acad Med Singap 2018;47:413-9. Crossref
    12. Liu W, Guo S, Qiu G, Zhang SX. Corporal punishment and adolescent aggression: an examination of multiple intervening mechanisms and the moderating effects of parental responsiveness and demandingness. Child Abuse Negl 2021;115:105027. Crossref
    13. Tang CS. Corporal punishment and physical maltreatment against children: a community study on Chinese parents in Hong Kong. Child Abuse Negl 2006;30:893-907. Crossref
    14. Gera SK, Raveendran R, Mahadev A. Pattern of fractures in non-accidental injuries in the pediatric population in Singapore. Clin Orthop Surg 2014;6:432-8. Crossref
    15. Fong CM, Cheung HM, Lau PY. Fractures associated with non-accidental injury—an orthopaedic perspective in a local regional hospital. Hong Kong Med J 2005;11:445-51.
    16. Offiah A, van Rijn RR, Perez-Rossello JM, Kleinman PK. Skeletal imaging of child abuse (non-accidental injury). Pediatr Radiol 2009;39:461-70. Crossref
    17. Sidpra J, Chhabda S, Oates AJ, Bhatia A, Blaser SI, Mankad K. Abusive head trauma: neuroimaging mimics and diagnostic complexities. Pediatr Radiol 2021;51:947-65. Crossref
    18. Karibe H, Kameyama M, Hayashi T, Narisawa A, Tominaga T. Acute subdural hematoma in infants with abusive head trauma: a literature review. Neurol Med Chir (Tokyo) 2016;56:264-73. Crossref
    19. Hung KL. Pediatric abusive head trauma. Biomed J 2020;43:240-50. Crossref
    20. Sun DT, Zhu XL, Poon WS. Non-accidental subdural haemorrhage in Hong Kong: incidence, clinical features, management and outcome. Childs Nerv Syst 2006;22:593-8. Crossref
    21. So EC, Chan D. Management of Child Maltreatment (Abuse). Hong Kong: Hospital Authority New Territories East Cluster Prince of Wales Hospital Department of Paediatrics; 2024.
    22. Social Welfare Department, Hong Kong SAR Government. Protecting Children from Maltreatment Procedural Guide for Multi-disciplinary Co-operation (Revised 2020). Jan 2020. Available from: https://www.swd.gov.hk/storage/asset/section/652/en/Procedural_Guide_Core_Procedures_(Revised_2020)_Eng_2Nov2021.pdf. Accessed 1 Oct 2024.
    23. Legislative Council, Hong Kong SAR Government. Mandatory Reporting of Child Abuse Ordinance. 2024. Available from: https://www.legco.gov.hk/yr2024/english/ord/2024ord023-e.pdf. Accessed 29 Oct 2024.
    24. Social Welfare Department, Hong Kong SAR Government. Statistics on child protection, spouse/cohabitant battering and sexual violence cases captured by the Child Protection Registry (CPR) and the Central Information System on Spouse/Cohabitant Battering Cases and Sexual Violence Cases (CISSCBSV). Social Welfare Department; 2025. Available from: https://data.gov.hk/en-data/dataset/hk-swd-fcw-ca-scb-sv-stat/resource/6229e2b4-73d0-4285-a892-838c683c9966. Accessed 8 Aug 2025.

    Pages