- Research
- Open access
- Published:
Peritoneal cytology predicting distant metastasis in uterine carcinosarcoma: machine learning model development and validation
World Journal of Surgical Oncology volume 23, Article number: 167 (2025)
Abstract
Objective
This study develops and validates a machine learning model using peritoneal cytology to predict distant metastasis in uterine carcinosarcoma, aiding clinical decision-making.
Methods
This study utilized detailed clinical data and peritoneal cytology findings from uterine carcinosarcoma patients in the SEER database. Eight machine learning algorithms—Logistic Regression, SVM, GBM, Neural Network, RandomForest, KNN, AdaBoost, and LightGBM—were applied to predict distant metastasis. Model performance was assessed using AUC, calibration curves, DCA, confusion matrices, sensitivity, and specificity. The Logistic Regression model was visualized with a nomogram, and its results were analyzed. SHAP values were used to interpret the best-performing machine learning model.
Results
Peritoneal cytology, T stage, age, and tumor size were key factors influencing distant metastasis in uterine carcinosarcoma patients. Peritoneal cytology had significant weight in the prediction models. The logistic regression model demonstrated excellent predictive performance with an AUC of 0.882 in the training set and 0.881 in the internal test set. The model was visualized and interpreted using a nomogram. In comprehensive evaluations, GBM was identified as the best-performing model and was explained using SHAP values. Additionally, calibration and DCA curves indicated that both models have significant potential clinical utility.
Conclusion
This study introduces the first effective tool for predicting distant metastasis in uterine carcinosarcoma patients by integrating peritoneal cytology features into model construction. It aids in early identification of high-risk patients, enhancing follow-up and monitoring during tumor development, and supports the optimization of personalized treatment strategies.
Introduction
Uterine Carcinosarcoma (UCS), also known as malignant mixed Müllerian tumor, is a rare gynecological malignancy with a poor prognosis, accounting for approximately 5% of all uterine tumors [1]. UCS is characterized by a high rate of lymphatic diffusion and significant tendencies for peritoneal and hematogenous metastases [2]. Literature reports that up to 30–40% of UCS patients present with lymph node metastasis at initial diagnosis, while about 10% exhibit visceral metastases, particularly pulmonary involvement [3]. Consequently, the five-year survival rate for patients with locally advanced or metastatic disease typically does not exceed 10%–30% [4].
Due to its rarity, specific treatment guidelines for UCS are limited. The prevailing theory is the "conversion hypothesis," suggesting that UCS may originate from an endometrial tumor clone and subsequently undergo metaplastic differentiation [5]. Therefore, current management largely follows the guidelines for endometrial cancer. Despite constituting only a small fraction of endometrial cancers, UCS exhibits a higher risk of distant metastasis and recurrence, leading to poorer patient outcomes. Early identification of high-risk patients with distant metastases and implementation of targeted comprehensive treatment strategies are crucial for improving prognosis.
Cytoreductive surgery is the primary treatment for UCS patients [6, 7]. Comprehensive surgical staging, including abdominal lavage, hysterectomy, salpingo-oophorectomy, and lymphadenectomy, is recommended for all operable patients [8]. Peritoneal cytology, which involves analyzing exfoliated cancer cells from intraoperative peritoneal lavage fluid or aspiration samples, helps detect free cancer cells in the peritoneal cavity. Preoperative fine needle aspiration can also provide quick and safe cytological assessments with minimal patient discomfort.This technique identifies minimal metastatic lesions not yet visible as masses or nodules, enabling early detection of potential peritoneal metastases beyond the uterus. Early detection of subclinical metastases, often overlooked in imaging studies, is particularly valuable.
Initially, positive peritoneal cytology was classified as stage IIIA under the International Federation of Gynecology and Obstetrics(FIGO) 1988 staging criteria for endometrial cancer. However, in 2009, FIGO revised its guidelines [9] to exclude peritoneal cytology from the staging system due to controversies regarding its prognostic significance [10, 11]. This change resulted in a decline in peritoneal cytology sampling during hysterectomies between 2010 and 2017 [12]. Despite this, several international authorities, including the European Society of Medical Oncology (ESMO), European Society of Gynaecological Oncology (ESGO), European Society for Radiotherapy & Oncology (ESTRO), Japanese Society of Gynecologic Oncology (JSGO), National Comprehensive Cancer Network (NCCN), and American Joint Committee on Cancer (AJCC), continue to support collecting peritoneal cytology samples during surgery and including them in pathology reports [13,14,15]. Similarly, the FIGO Gynecologic Oncology Committee recommends collecting peritoneal cytology samples, even though it has been removed from formal staging criteria, emphasizing that "positive cytology must be reported separately without affecting staging" [16]. Additionally, the 2021 ESGO/ESTRO/ESP guidelines highlight that malignant peritoneal cytology is associated with lower survival rates [17]. Recent studies have further shown that positive peritoneal cytology has significant prognostic implications, particularly for non-endometrioid types of endometrial cancer [18,19,20]. Thus, although peritoneal cytology is not used for formal staging, its presence provides important information for evaluating disease progression and patient prognosis, especially in specific types of endometrial cancer, serving as a valuable supplement to staging.
This study aims to develop and validate a predictive model that integrates peritoneal cytology findings to forecast the development of distant metastasis in UCS patients. By combining peritoneal cytology with other clinicopathological features, we seek to provide clinicians with an effective tool for identifying high-risk populations, optimizing medical resource allocation, and supporting personalized treatment strategies.Additionally, given the current lack of international consensus on the role of peritoneal cytology in UCS, our work may provide valuable data to support future updates to diagnostic and treatment guidelines.
Materials and methods
Data preparation
Data for this study were sourced from Surveillance, Epidemiology, and End Results Program(SEER) public database, utilizing SEER*Stat software version 8.4.4 for data extraction. Our analysis centered on patients diagnosed with UCS across 17 Registries between 2000 and 2021, with data submission in November 2023 [21]. Cases were screened using the "site code ICD-O-3 / WHO 2008", specifying the uterus as the site of origin (codes C54.0-C54.9, C55.9), and identified by the malignant tissue type defined as carcinosarcoma according to the "ICD-O-3 Histology/Behavior" codes (8950/3, 8951/3, 8980/3, 8981/3).For each patient, demographic and clinical variables were extracted, encompassing age, race, marital status, median family income, rural–urban continuum code, time to diagnosis, interval from diagnosis to treatment initiation, tumor dimensions, peritoneal cytology results, histological grade and TNM staging (AJCC 7th edition). Assessment of distant metastasis was conducted using the combinatorial staging data within the SEER database.Inclusion criteria comprised: histopathologically confirmed disease, single primary tumors, and diagnoses made between 2010 and 2021. Patients were excluded if they had missing data regarding peritoneal cytology and distant metastasis. Ultimately, a cohort of 3,434 endometrial cancer patients met the criteria for detailed analysis.This study adheres to the Declaration of Helsinki principles. Given that the SEER data are de-identified and available for research purposes, local ethics committee approval was not required.
Data processing
In this study, any data entries with more than 35% missing parameters were excluded from the analysis. The remaining features underwent preprocessing through multiple imputation (MI) facilitated by a multi-classification regression model [22, 23]. Patient data were randomly partitioned into a training set and an internal test set at a ratio of 7:3, where the former was utilized for model development and the latter served for validation and evaluation. Continuous variables are reported as mean ± standard deviation (SD) if normally distributed, as assessed by the Kolmogorov–Smirnov test, or as median (interquartile range) if not, with comparisons made using the t-test or Mann–Whitney U test, respectively. Categorical variables are summarized as counts and frequencies, with comparisons conducted via chi-square or Fisher's exact tests. All statistical analyses were two-tailed, with P < 0.05 denoting statistical significance.To address the imbalance in the dataset due to the low incidence of distant metastases, we applied two techniques during the machine learning phase: resampling and weighted processing. To address class imbalance, we used the SMOTENC method [24] to increase minority class samples in the training set and retained original data for the test set to assess model generalization. For weighting, we assigned higher weights to minority class samples and lower weights to majority ones, based on the inverse of their proportions, [25] ensuring the model's attention to the minority class during training while allowing unbiased evaluation on an untouched test set. Both approaches were compared against untreated data post-modeling to evaluate their effectiveness.
Factor screening
To evaluate the correlations among features in the training set, we applied Spearman correlation analysis, with a correlation coefficient threshold set at 0.7. A coefficient below this threshold suggests an absence of significant multicollinearity among characteristic variables.Initially, we generated a correlation heatmap based on the Spearman coefficients to visualize the degree of association between each pair of variables. To delve deeper into the structural relationships among these variables, we conducted cluster analysis using[1-abs(spearman_cor)] as the distance measure. This approach allows for an equal emphasis on both negative and positive correlations, ensuring a balanced evaluation of variable similarity. The clustering results were represented as a dendrogram, highlighting the hierarchical structure of feature relationships.When two or more features exhibit high correlation, they tend to provide similar information. Including all such highly correlated features in a predictive model can unnecessarily increase model complexity without significantly enhancing its performance.Following this, we used the occurrence of distant metastasis as the outcome variable and performed univariate analysis to identify predictors significantly associated with distant metastasis (P < 0.05) within the training set. The variables that met this significance threshold were subsequently included in multivariate logistic regression analysis. The final feature set for the machine learning model was then determined based on the results of this multivariate logistic regression analysis (P < 0.05), ensuring only the most relevant predictors were selected.
Model construction and evaluation
Based on the feature selection methodology described above, we constructed eight distinct machine learning algorithms to develop predictive models for distant metastasis in UCS, including Logistic Regression, Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Neural Network (NeuralNet), RandomForest, K-Nearest Neighbors (KNN), AdaBoost, and LightGBM. Logistic Regression, a linear model suitable for linearly separable features, offers simplicity and interpretability; SVM maximizes the margin between classes by identifying the optimal hyperplane, providing robust classification especially in high-dimensional spaces; [26] GBM iteratively builds weak classifiers to enhance predictive power, capturing nonlinear relationships and interaction effects while optimizing the loss function via gradient descent; [27] Neural Networks emulate the human brain's structure through multiple layers of neurons, enabling the learning of complex features and modeling of nonlinear relationships; Random Forest integrates multiple decision trees to improve stability and accuracy, reducing overfitting and enhancing generalization; [28] KNN, an instance-based learning method, predicts categories by calculating distances between new samples and existing ones; [29] AdaBoost improves predictive ability by iteratively adjusting sample weights, focusing more on misclassified instances; [30] and LightGBM, an efficient gradient boosting framework, accelerates model training using histograms and feature parallelization [31]. During the training process, these algorithms underwent tenfold cross-validation on the training set data to obtain a robust estimate of model performance. In evaluating the performance of the predictive models, we adopted a comprehensive assessment system designed to thoroughly measure both the classification ability and clinical utility of the models. The primary evaluation tool was the area under the receiver operating characteristic curve (ROC-AUC), which assessed the model's overall discrimination capability. Additionally, calibration curves were used to verify the accuracy of the predicted probabilities, ensuring that the model’s predictions were well-calibrated. Decision Curve Analysis (DCA) evaluated the practical application value of the model from the perspective of clinical benefit, providing insights into its real-world applicability. Moreover, we utilized confusion matrices to transparently present the model prediction results, determining the optimal threshold for the test set based on model accuracy. Various performance metrics were computed, including accuracy, sensitivity, specificity, precision, and F1 score, to offer detailed insights into the model's classification performance. This comprehensive approach ensured a thorough understanding of the model's effectiveness and reliability in clinical settings.
These are based on four basic values: true positives (True Positive, TP), false positives (False Positive, FP), true negatives (True Negative, TN), and false negatives (False Negative, FN).
Model interpretation
To enhance the usability and interpretability of our logistic regression model, we utilized a nomogram for intuitive visualization and result interpretation. This tool not only simplifies model application but also clarifies the contribution of each covariate to the overall prediction score, providing an easy method to estimate individual patient probabilities of distant metastasis. The odds ratio (Odds = P / (1-P)) derived from the nomogram reflects the likelihood of distant metastasis relative to no occurrence.Based on comprehensive performance evaluations, the GBM model was identified as the optimal predictor for distant metastasis in UCS patients, with features ranked by their importance. SHapley Additive exPlanations(SHAP) plots were used to visualize feature contributions, allowing for quantitative analysis of each variable's impact on distant metastasis risk [32]. Positive SHAP values indicate risk factors, while negative values suggest protective factors.We provided personalized explanations for two randomly selected patients regarding their likelihood of developing distant metastases based on model predictions.
All statistical analyses were conducted using R software (version 4.4.1), and the corresponding analysis code is available upon request from the authors.
Result
Baseline information and correlation analysis
In this study, we analyzed data from a total of 3,434 UCS patients to investigate the relationship between peritoneal cytology findings and distant metastasis, revealing a significant association (χ2 = 123.45, p < 0.0001) (Table 1)., suggesting that peritoneal cytology may serve as an Important reference indicators for primary clinical screening. Then,patients were divided into a training set and an internal test set in a 7:3 ratio, with baseline characteristics summarized in Table 2; in the training set, statistically significant differences (p < 0.05) were observed between patients with and without distant metastasis regarding peritoneal cytology, differentiation grade, T stage, N stage, time from diagnosis to treatment, tumor size, and age (Table 3). Figure 1 presents the Spearman correlation analysis of various feature metrics in the training set, where a darker color indicates a higher correlation, with a threshold of 0.7 suggesting strong associations (Fig. 1A),
The results of the hierarchical clustering indicate that there is no significant multicollinearity among the feature variables in the training set (Fig. 1B).
Multicollinearity: (A) The relationship between the feature indicators in the training set through Spearman correlation analysis, with a strong correlation between variables greater than 0.7. B Hierarchical cluster analysis of [1 - abs(spearman_cor)] as a distance metric, and a strong correlation between variables greater than 0.3
Univariate and multivariable logistic regression
In univariate logistic regression (LR) analysis, positive peritoneal cytology, differentiation grade G3, T stage, N stage, and tumor size were identified as risk factors for distant metastasis in UCS patients (all odds ratios [OR] > 1, 95% confidence intervals [CI] > 1, p < 0.05). Conversely, age and time from diagnosis to treatment were found to be protective factors (OR and 95% CI < 1, p < 0.05).Multivariate LR analysis further revealed that positive peritoneal cytology, T stage, and tumor size remained independent risk factors for distant metastasis (OR and 95% CI > 1, p < 0.05), while age was confirmed as an independent protective factor (OR and 95% CI < 1, p < 0.05) (Table 4).
Model building and performance evaluation
In constructing a machine learning model to predict distant metastasis, we selected key features based on multifactor logistic regression (LR), including peritoneal cytology status, T stage, tumor size, and age. We enhanced the performance of multiple classification algorithms using tenfold cross-validation. Results demonstrated excellent predictive power across all models, with area under the receiver operating characteristic curve (ROC-AUC) values exceeding 0.7 (Fig. 2). Calibration curves (Fig. 3), decision curve analysis (DCA) (Fig. 4), accuracy, recall (sensitivity) (Table 5), and other metrics also performed well, indicating the effectiveness of these models for predicting distant metastasis.Notably, while the Random Forest model showed strong performance on the training set, significant overfitting was observed in the internal test set, leading us to exclude it as the best candidate. Using AUC as the primary evaluation criterion, Logistic Regression achieved the highest AUC values of 0.882 on the training set and 0.881 on the internal test set, demonstrating robust discrimination. In handling imbalanced datasets, the F1 score is more critical than ROC-AUC. A "baseline model" that always predicts distant metastasis achieved an F1 score of 0.332. Our models significantly outperformed this baseline, with GBM and AdaBoost showing superior F1 scores compared to Logistic Regression. However, due to AdaBoost’s lower recall rate, which could lead to missed diagnoses, we selected the Gradient Boosting Machine (GBM) model as optimal. GBM achieved an F1 score of 0.630, demonstrating superior generalization and clinical applicability. In conclusion, the GBM model was chosen as the optimal model. Feature importance rankings (Fig. 5) highlighted peritoneal cytology as a critical feature for improving model performance in both LR and GBM models. To further validate the model’s performance and ensure transparency, we constructed confusion matrices for both models (Fig. 6). Comparisons revealed that the GBM model performed better at distinguishing cases of distant metastases.
In a "baseline model" that always predicts distant metastasis, the precision is approximately 0.197. Although our developed models significantly outperform this baseline model, to further optimize precision, we thoroughly investigated the impact of data imbalance on model performance and adopted two main strategies: “SMOTE-NC for Synthetic Sampling” and “Adjusting Sample Weights”. Comparing these approaches using the GBM model, our results are summarized in Table 6. While these techniques improved model accuracy on the internal test set, they did so at the expense of sensitivity (recall), which is crucial in clinical settings. For a preliminary screening tool aimed at identifying UCS patients at risk for distant metastasis, missing actual cases of distant metastasis is clinically more serious than reducing overall precision. Therefore, we opted to use the GBM model trained on unbalanced data as the optimal model for practical application, ensuring higher sensitivity and minimizing missed diagnoses. This decision was partly due to our sufficiently large sample size, which helped ensure the accuracy of our results.
Model interpretability
During the model interpretability processing phase, the logistic regression model was interpreted and applied using Nomogram plots (Fig. 7).The optimal GBM model, we constructed a feature importance ranking based on SHAP values, revealing that in UCS patients, the key factors influencing distant metastasis were T stage, peritoneal cytology, tumor size, and age; in the visualization (Fig. 8B), yellow indicates risk factors for distant metastasis, and purple highlights protective factors. To further elucidate the model’s predictions, we randomly selected two groups of UCS patients, one at high risk and another at low risk of distant metastasis, with Fig. 8A illustrating the low-risk group, exemplified by a patient aged 67 years with a tumor size of 55 mm, T stage beyond the uterus (T3/T4), and negative peritoneal cytology, where negative peritoneal cytology served as an important protective factor against distant metastasis, and Fig. 8C depicting the high-risk group, characterized by a patient aged 62 years with a tumor size of 70 mm, T stage beyond the uterus (T3/T4), and positive peritoneal cytology, which was identified as a critical risk factor for distant metastasis.
SHAP plot of the GBM model: Figure B uses the GBM machine learning model to rank the importance of the characteristic variables of distant metastasis in UCS patients. In these graphs, yellow represents a variable as a risk factor for distant transfer, while purple represents a variable as a protective factor. Figure A shows the distribution of SHAP values in the low-risk group and C
Discussion
Uterine carcinosarcoma (UCS) is a highly malignant gynecological tumor characterized by complex and aggressive biology, with a propensity for early and distant metastases. Currently, there is a lack of reliable indicators or prediction models to assess the risk of distant metastasis in UCS patients. This study successfully developed and validated a predictive model that incorporates peritoneal cytology features to evaluate distant metastasis risk in UCS patients. The model exhibited strong discrimination and calibration capabilities, highlighting its potential as a valuable clinical tool.
In our study, peritoneal cytology emerged as a critical feature influencing distant metastasis in UCS patients. This is likely because UCS, being a highly malignant tumor, exhibits strong invasive and early metastatic tendencies. When peritoneal cytology results are positive, it indicates that tumor cells have acquired the ability to breach the basement membrane and enter the abdominal cavity, suggesting their invasive biological behavior and potential for distant metastasis [33, 34]. Compared to other invasive diagnostic methods such as laparoscopic biopsy, peritoneal cytology offers a relatively non-invasive approach to rapidly and safely obtain cytological results, thereby minimizing patient discomfort and complications. Unlike imaging examinations, which can be subjective and dependent on the expertise of the imaging physician, equipment quality, and the size of metastatic tumors. Peritoneal cytology results are typically included in routine pathology reports, providing easily accessible and highly standardized data. This reliable data foundation is crucial for constructing and validating predictive models. Additionally, the application of genetic analysis in peritoneal cytology holds significant promise. Advanced technologies such as high-throughput sequencing, liquid biopsy, and multi-omics integration offer more detailed and comprehensive information compared to traditional pathology [35]. These innovations are expected to enhance the prediction of distant metastasis and improve clinical management outcomes, while also opening new avenues for genetic research and personalized medicine.
In our current study, the results also demonstrate that T stage significantly contributes to distant metastasis in UCS patients. Specifically, diagnoses of T3 or T4 stages indicate extensive local invasion beyond the uterus, often involving lymphatic and blood vessels [36]. This facilitates tumor cells entering the circulation, thereby promoting distant metastasis. Additionally, our results show a positive correlation between tumor size and distant metastasis risk in UCS patients [37]. Larger tumors tend to exhibit higher cell proliferation rates and greater aggressiveness, [38] increasing the likelihood of local and distant spread. The hypoxic environment within growing tumors may activate pro-metastatic signaling pathways, [39] further enhancing metastatic potential. These findings underscore the need for more aggressive evaluation and management strategies for high-risk patients, including wider surgical resection and postoperative adjuvant treatments.Interestingly, we observed an inverse but weak relationship between age and distant metastasis in UCS patients. In certain specific tumor types,younger patients' tumors exhibited unique biological properties [40,41,42] such as higher proliferation rates, greater invasiveness, and metastatic ability, [43, 44] potentially due to specific gene mutations or regulatory mechanisms [45]. This suggests that molecular typing distribution and associated biological behaviors may vary with age, highlighting the importance of understanding age-related differences in UCS biology. The 2023 FIGO update on endometrial cancer staging and molecular typing reflects deeper insights into the complexity and potential biological behavior of these tumors [46]. Further research is needed to explore age-related biological differences and their implications for clinical management.
In our current study, some limitations must be acknowledged. First, this is a retrospective analysis that poses challenges such as data quality issues, information bias and selection bias, so future studies should adopt a prospective design to overcome these limitations and provide more reliable data support. Second, the performance of our machine learning model may be affected by geographical and hospital differences, and patient characteristics and treatment patterns may vary significantly between regions and medical institutions. To ensure the robustness and generalization ability of the model, we need more institutions to participate in the external validation. To this end, in follow-up studies, we plan to incorporate a more diverse and broader multicenter dataset for validation and testing of models, aiming to address potential variations in model performance across different populations and healthcare settings. In addition, it is an important direction for future research to explore the mechanism of occurrence and development through peritoneal cytology examination combined with genomics and proteomics, and to look for more specific and sensitive predictors.
In conclusion, based on the large-scale multicenter data set, our prediction model provides new ideas and technical support for distant metastasis prediction of USC patients. In practice, the prediction results of this model can help develop personalized follow-up plan, especially for those patients predicted to be at high risk of metastasis, suggesting more frequent and targeted monitoring measures to ensure early detection of potential problems and timely intervention treatment. This not only improves the quality of life of patients, but also enables the more reasonable allocation of medical resources, and realizes the maximum efficiency of medical services.
Conclusion
This study introduces the first effective tool for predicting distant metastasis in uterine carcinosarcoma patients by integrating peritoneal cytology features into model construction. It aids in early identification of high-risk patients, enhancing follow-up and monitoring during tumor development, and supports the optimization of personalized treatment strategies.
Data availability
Data is provided within the manuscript or supplementary information files.
Abbreviations
- UCS:
-
Uterine Carcinosarcoma
- SEER:
-
Surveillance, Epidemiology, and End Results Program
- SMOTENC:
-
Synthetic Minority Over-sampling Technique for Nominal and Continuous
- SVM:
-
Support Vector Machine
- GBM:
-
Gradient Boosting Machine
- KNN:
-
K-Nearest Neighbors
- LR:
-
Logistic regression
- ROC:
-
Receiver Operating Characteristic
- AUC:
-
Area Under the Curve
- DCA:
-
Decision Curve Analysis
- SHAP:
-
SHapley Additive exPlanations
- FIGO:
-
International Federation of Gynecology and Obstetrics
- ESMO:
-
European Society of Medical Oncology
- ESGO:
-
European Society of Gynaecological Oncology
- ESTRO:
-
European Society for Radiotherapy & Oncology
- JSGO:
-
Japanese Society of Gynecologic Oncology
- NCCN:
-
National Comprehensive Cancer Network
- AJCC:
-
American Joint Committee on Cancer
- SD:
-
Standard deviation
References
Matsuo K, Ross MS, Machida H, Blake EA, Roman LD. Trends of uterine carcinosarcoma in the United States. J Gynecol Oncol. 2018;29(2):e22.
Pradhan TS, Stevens EE, Ablavsky M, Salame G, Lee YC, Abulafia O. FIGO staging for carcinosarcoma: can the revised staging system predict overall survival? Gynecol Oncol. 2011;123(2):221–4.
Ravishankar P, Smith DA, Avril S, Kikano E, Ramaiya NH. Uterine carcinosarcoma: a primer for radiologists. Abdom Radiol (NY). 2019;44(8):2874–85.
Bansal N, Herzog TJ, Seshan VE, Schiff PB, Burke WM, Cohen CJ, Wright JD. Uterine carcinosarcomas and grade 3 endometrioid cancers: evidence for distinct tumor behavior. Obstet Gynecol. 2008;112(1):64–70.
de Jong RA, Nijman HW, Wijbrandi TF, Reyners AK, Boezen HM, Hollema H. Molecular markers and clinical behavior of uterine carcinosarcomas: focus on the epithelial tumor component. Mod Pathol. 2011;24(10):1368–79.
Gracia M, Yildirim Y, Macuks R, Mancari R, Achimas-Cadariu P, Polterauer S, Iacoponi S, Zapardiel I. SARCUT Study Group Influence of Clinical and Surgical Factors on Uterine Carcinosarcoma Survival. Cancers (Basel). 2023;15(5):1463.
Koh WJ, Abu-Rustum NR, Bean S, Bradley K, Campos SM, Cho KR, Chon HS, Chu C, Cohn D, Crispens MA, Damast S, Dorigo O, Eifel PJ, Fisher CM, Frederick P, Gaffney DK, George S, Han E, Higgins S, Huh WK, Lurain JR 3rd, Mariani A, Mutch D, Nagel C, Nekhlyudov L, Fader AN, Remmenga SW, Reynolds RK, Tillmanns T, Ueda S, Wyse E, Yashar CM, McMillian NR, Scavone JL. Uterine Neoplasms, Version 1.2018, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw. 2018;16(2):170–99.
Rungruang B, Olawaiye AB. Comprehensive surgical staging for endometrial cancer. Rev Obstet Gynecol. 2012;5(1):28–34.
Pecorelli S. Revised FIGO staging for carcinoma of the vulva, cervix, and endometrium. Int J Gynaecol Obstet. 2009;105(2):103–4.
Tebeu PM, Popowski Y, Verkooijen HM, Bouchardy C, Ludicke F, Usel M, Major AL. Positive peritoneal cytology in early-stage endometrial cancer does not influence prognosis. Br J Cancer. 2004;91(4):720–4.
Saga Y, Imai M, Jobo T, Kuramoto H, Takahashi K, Konno R, Ohwada M, Suzuki M. Is peritoneal cytology a prognostic factor of endometrial cancer confined to the uterus? Gynecol Oncol. 2006;103(1):277–80.
Matsuo K, Klar M, Harter P, Miller H, Nusbaum DJ, Matsuzaki S, Roman LD, Wright JD. Trends in peritoneal cytology evaluation at hysterectomy for endometrial cancer in the United States. Gynecol Oncol. 2021;161(3):710–9.
National Comprehensive Cancer Network (NCCN). NCCN Clinical Practice Guidelines in Oncology.Uterine Neoplasms Version 1 2022. Fort Washington, PA: NCCN; 2021. Available from: https://www.nccn.org/professionals/physician_gls/pdf/uterine.pdf.
Colombo N, Creutzberg C, Amant F, et al. ESMO-ESGO-ESTRO Consensus Conference On Endometrial Cancer: diagnosis, treatment and follow-up. Ann Oncol. 2016;27(1):16–41.
Yamagami W, Mikami M, Nagase S, et al. Japan Society of Gynecologic Oncology 2018 guidelines for treatment of uterine body neoplasms. J Gynecol Oncol. 2020;31(1):e18.
Amant F, Mirza MR, Koskas M, Creutzberg CL. Cancer of the corpus uteri. Int J Gynaecol Obstet. 2018;143(Suppl 2):37–50.
Concin N, Matias-Guiu X, Vergote I, et al. ESGO/ESTRO/ESP guidelines for the management of patients with endometrial carcinoma. Int J Gynecol Cancer. 2021;31(1):12–39.
Matsuo K, Matsuzaki S, Nusbaum DJ, et al. Significance of Malignant Peritoneal Cytology on Survival of Women with Uterine Sarcoma. Ann Surg Oncol. 2021;28(3):1740–8.
Matsuo K, Nusbaum DJ, Matsuzaki S, et al. Malignant peritoneal cytology and increased mortality risk in stage I non-endometrioid endometrial cancer. Gynecol Oncol. 2020;159(1):43–51.
Sakai K, Yamagami W, Takahashi F, et al. Prognostic impact of peritoneal cytology on treating endometrial cancer using data from the Japan Society of Obstetrics and Gynecology cancer registry. J Gynecol Oncol. 2024;36:e41.
Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) SEER*Stat Database: Incidence - SEER Research Data, 17 Registries, Nov 2023 Sub (2000-2021) - Linked To County Attributes - Time Dependent (1990-2022) Income/Rurality, 1969-2022 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2024, based on the November 2023 submission.
Rui F, Yeo YH, Xu L, Zheng Q, Xu X, Ni W, Tan Y, Zeng QL, He Z, Tian X, Xue Q, Qiu Y, Zhu C, Ding W, Wang J, Huang R, Xu Y, Chen Y, Fan J, Fan Z, Qi X, Huang DQ, Xie Q, Shi J, Wu C, Li J. Development of a machine learning-based model to predict hepatic inflammation in chronic hepatitis B patients with concurrent hepatic steatosis: a cohort study. EClinicalMedicine. 2024;16(68):102419.
Barakat MS, Field M, Ghose A, et al. The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance. Health Inf Sci Syst. 2017;5(1):16.
Fonseca J, Bacao F. Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications. 2023;234:121053.
Yang K, Yu Z, Chen CLP, Cao W, You J, Wong H-S. Incremental Weighted Ensemble Broad Learning System for Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering. 2022;34(12):5809–24.
Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer Genomics Proteomics. 2018;15(1):41–51.
Dash TK, Chakraborty C, Mahapatra S, Panda G. Gradient boosting machine and efficient combination of features for speech-based detection of COVID-19. IEEE J Biomed Health Inform. 2022;26(11):5364–71.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Lu X, Chen Y, Zhang G, Zeng X, Lai L, Qu C. Application of interpretable machine learning algorithms to predict acute kidney injury in patients with cerebral infarction in ICU. J Stroke Cerebrovasc Dis. 2024;33(7):107729.
Natras R, Soja B, Schmidt M. Ensemble Machine Learning of Random Forest, AdaBoost and XGBoost for Vertical Total Electron Content Forecasting. Remote Sens. 2022;14:3547.
Chae M, Yoon H, Lee H, Choi J. Hearing Recovery Prediction for Patients with Chronic Otitis Media Who Underwent Canal-Wall-Down Mastoidectomy. J Clin Med. 2024;13(6):1557.
Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Progr Biomed. 2022;214:106584.
Takenaka M, Kamii M, Iida Y, Yanaihara N, Suzuki J, Takahashi K, Yanagida S, Saito M, Takano H, Yamada K, Okamoto A. Re-thinking the prognostic significance of positive peritoneal cytology in endometrial cancer. Gynecol Oncol. 2021;161(1):135–42.
Matsuo K, Matsuzaki S, Miller H, et al. Clinico-pathological significance of suspicious peritoneal cytology in endometrial cancer. J Surg Oncol. 2021;124(4):687–98.
Villiger AS, Zurbriggen S, Imboden S, Solass W, Christe L, Saner FAM, Gmür A, Rau TT, Mueller MD, Siegenthaler F. Reviving peritoneal cytology: Exploring its role in endometrial cancer molecular classification. Gynecol Oncol. 2024;182:148–55.
Jónsdóttir B, Marcickiewicz J, Borgfeldt C, Bjurberg M, Dahm-Kähler P, Flöter-Rådestad A, Hellman K, Holmberg E, Kjølhede P, Rosenberg P, Tholander B, Åvall-Lundqvist E, Stålberg K, Högberg T. Preoperative and intraoperative assessment of myometrial invasion in endometrial cancer-A Swedish Gynecologic Cancer Group (SweGCG) study. Acta Obstet Gynecol Scand. 2021;100(8):1526–33.
Gracia M, Yildirim Y, Macuks R, Mancari R, Achimas-Cadariu P, Polterauer S, Iacoponi S, Zapardiel I. SARCUT Study Group Influence of Clinical and Surgical Factors on Uterine Carcinosarcoma Survival. Cancers (Basel). 2023;15(5):1463.
Gao J, Ao Y, Wang S, Chen Z, Zhang Y, Ding J, Jiang J. WHO histological classification and tumor size are predictors of the locally aggressive behavior of thymic epithelial tumors. Lung Cancer. 2024;187:107446.
Baek S, Yu SE, Deng YH, Lee YJ, Lee DG, Kim S, Yoon S, Kim HS, Park J, Lee CH, Lee JB, Kong HJ, Kang SG, Shin YM, Sung HJ. Quenching Epigenetic Drug Resistance Using Antihypoxic Microparticles in Glioblastoma Patient-Derived Chips. Adv Healthc Mater. 2022;11(8):e2102226.
Chen MT, Sun HF, Zhao Y, Fu WY, Yang LP, Gao SP, Li LD, Jiang HL, Jin W. Comparison of patterns and prognosis among distant metastatic breast cancer patients by age groups: a SEER population-based analysis. Sci Rep. 2017;7(1):9254.
Huang H, Xu S, Wang X, Liu S, Liu J. Patient Age Is Significantly Related to Distant Metastasis of Papillary Thyroid Microcarcinoma. Front Endocrinol (Lausanne). 2021;12:748238.
Purushotham A, Shamil E, Cariati M, Agbaje O, Muhidin A, Gillett C, Mera A, Sivanadiyan K, Harries M, Sullivan R, Pinder SE, Garmo H, Holmberg L. Age at diagnosis and distant metastasis in breast cancer – A surprising inverse relationship. European Journal of Cancer. 2014;50(10):1697–705.
Azim HA Jr, Nguyen B, Brohée S, Zoppoli G, Sotiriou C. Genomic aberrations in young and elderly breast cancer patients. BMC Med. 2015;13:266.
Buza N, Baine I, Hui P. Precision genotyping diagnosis of lung tumors with trophoblastic morphology in young women. Mod Pathol. 2019;32(9):1271–80.
Cuicui ZHAO, Hong LIU. Molecular Biological Pathogenesis of Young Breast Cancer[J]. Cancer Research on Prevention and Treatment. 2020;47(3):213–7.
Matias-Guiu X, Lax S, Raspollini MR, Palacios J, Zheng W, Liu C, de Brot L, Lordello L, Hardisson D, Gaffney D, Mutch D, Scambia G, Creutzberg CL, Fotopoulou C, Berek JS, Concin N. FIGO 2023 staging for endometrial cancer, when, if it is not now? Eur J Cancer. 2024;213:115115.
Acknowledgements
We extend our gratitude to the participants and researchers involved in the SEER database and all related open data and studies, whose contributions have been instrumental to our research.
Conflict of interest
The authors declare no competing interests.
Funding
This study was funded by Fujian Provincial Science and Technology Innovation Joint Fund Project(No. 2023Y9454), Fujian Provincial Natural Science Foundation Project(No. 2024J011087) and Fujian Provincial Health Commission Science and Technology Plan Project(No. 2024CXA032).
Author information
Authors and Affiliations
Contributions
All authors made substantial contributions to the interpretation of data and critically revised the manuscript. All authors approved the final version submitted and agreed to be accountable for their contributions. Specific contributions are as follows: Qiaoming Lin: Writing—original draft, review & editing, data analysis, visualization, methodology, conceptualization. Qi Guan: Writing—original draft, methodology, investigation, data curation. Danru Chen: Software, methodology, validation, investigation. Lilan Li:formal analysis, original draft.Yibin Lin: Writing—review & editing, validation, supervision, project administration, data curation, conceptualization. All authors contributed to the revision of the paper, interpreted the results, read, and approved the final manuscript. The submitting author reports no conflicts of interest.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This study utilized publicly available abstracted data, and therefore did not require additional ethical approval.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lin, Q., Guan, Q., Chen, D. et al. Peritoneal cytology predicting distant metastasis in uterine carcinosarcoma: machine learning model development and validation. World J Surg Onc 23, 167 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12957-025-03771-9
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12957-025-03771-9