Skip to main content

Peritoneal cytology predicting distant metastasis in uterine carcinosarcoma: machine learning model development and validation

Abstract

Objective

This study develops and validates a machine learning model using peritoneal cytology to predict distant metastasis in uterine carcinosarcoma, aiding clinical decision-making.

Methods

This study utilized detailed clinical data and peritoneal cytology findings from uterine carcinosarcoma patients in the SEER database. Eight machine learning algorithms—Logistic Regression, SVM, GBM, Neural Network, RandomForest, KNN, AdaBoost, and LightGBM—were applied to predict distant metastasis. Model performance was assessed using AUC, calibration curves, DCA, confusion matrices, sensitivity, and specificity. The Logistic Regression model was visualized with a nomogram, and its results were analyzed. SHAP values were used to interpret the best-performing machine learning model.

Results

Peritoneal cytology, T stage, age, and tumor size were key factors influencing distant metastasis in uterine carcinosarcoma patients. Peritoneal cytology had significant weight in the prediction models. The logistic regression model demonstrated excellent predictive performance with an AUC of 0.882 in the training set and 0.881 in the internal test set. The model was visualized and interpreted using a nomogram. In comprehensive evaluations, GBM was identified as the best-performing model and was explained using SHAP values. Additionally, calibration and DCA curves indicated that both models have significant potential clinical utility.

Conclusion

This study introduces the first effective tool for predicting distant metastasis in uterine carcinosarcoma patients by integrating peritoneal cytology features into model construction. It aids in early identification of high-risk patients, enhancing follow-up and monitoring during tumor development, and supports the optimization of personalized treatment strategies.

Introduction

Uterine Carcinosarcoma (UCS), also known as malignant mixed Müllerian tumor, is a rare gynecological malignancy with a poor prognosis, accounting for approximately 5% of all uterine tumors [1]. UCS is characterized by a high rate of lymphatic diffusion and significant tendencies for peritoneal and hematogenous metastases [2]. Literature reports that up to 30–40% of UCS patients present with lymph node metastasis at initial diagnosis, while about 10% exhibit visceral metastases, particularly pulmonary involvement [3]. Consequently, the five-year survival rate for patients with locally advanced or metastatic disease typically does not exceed 10%–30% [4].

Due to its rarity, specific treatment guidelines for UCS are limited. The prevailing theory is the "conversion hypothesis," suggesting that UCS may originate from an endometrial tumor clone and subsequently undergo metaplastic differentiation [5]. Therefore, current management largely follows the guidelines for endometrial cancer. Despite constituting only a small fraction of endometrial cancers, UCS exhibits a higher risk of distant metastasis and recurrence, leading to poorer patient outcomes. Early identification of high-risk patients with distant metastases and implementation of targeted comprehensive treatment strategies are crucial for improving prognosis.

Cytoreductive surgery is the primary treatment for UCS patients [6, 7]. Comprehensive surgical staging, including abdominal lavage, hysterectomy, salpingo-oophorectomy, and lymphadenectomy, is recommended for all operable patients [8]. Peritoneal cytology, which involves analyzing exfoliated cancer cells from intraoperative peritoneal lavage fluid or aspiration samples, helps detect free cancer cells in the peritoneal cavity. Preoperative fine needle aspiration can also provide quick and safe cytological assessments with minimal patient discomfort.This technique identifies minimal metastatic lesions not yet visible as masses or nodules, enabling early detection of potential peritoneal metastases beyond the uterus. Early detection of subclinical metastases, often overlooked in imaging studies, is particularly valuable.

Initially, positive peritoneal cytology was classified as stage IIIA under the International Federation of Gynecology and Obstetrics(FIGO) 1988 staging criteria for endometrial cancer. However, in 2009, FIGO revised its guidelines [9] to exclude peritoneal cytology from the staging system due to controversies regarding its prognostic significance [10, 11]. This change resulted in a decline in peritoneal cytology sampling during hysterectomies between 2010 and 2017 [12]. Despite this, several international authorities, including the European Society of Medical Oncology (ESMO), European Society of Gynaecological Oncology (ESGO), European Society for Radiotherapy & Oncology (ESTRO), Japanese Society of Gynecologic Oncology (JSGO), National Comprehensive Cancer Network (NCCN), and American Joint Committee on Cancer (AJCC), continue to support collecting peritoneal cytology samples during surgery and including them in pathology reports [13,14,15]. Similarly, the FIGO Gynecologic Oncology Committee recommends collecting peritoneal cytology samples, even though it has been removed from formal staging criteria, emphasizing that "positive cytology must be reported separately without affecting staging" [16]. Additionally, the 2021 ESGO/ESTRO/ESP guidelines highlight that malignant peritoneal cytology is associated with lower survival rates [17]. Recent studies have further shown that positive peritoneal cytology has significant prognostic implications, particularly for non-endometrioid types of endometrial cancer [18,19,20]. Thus, although peritoneal cytology is not used for formal staging, its presence provides important information for evaluating disease progression and patient prognosis, especially in specific types of endometrial cancer, serving as a valuable supplement to staging.

This study aims to develop and validate a predictive model that integrates peritoneal cytology findings to forecast the development of distant metastasis in UCS patients. By combining peritoneal cytology with other clinicopathological features, we seek to provide clinicians with an effective tool for identifying high-risk populations, optimizing medical resource allocation, and supporting personalized treatment strategies.Additionally, given the current lack of international consensus on the role of peritoneal cytology in UCS, our work may provide valuable data to support future updates to diagnostic and treatment guidelines.

Materials and methods

Data preparation

Data for this study were sourced from Surveillance, Epidemiology, and End Results Program(SEER) public database, utilizing SEER*Stat software version 8.4.4 for data extraction. Our analysis centered on patients diagnosed with UCS across 17 Registries between 2000 and 2021, with data submission in November 2023 [21]. Cases were screened using the "site code ICD-O-3 / WHO 2008", specifying the uterus as the site of origin (codes C54.0-C54.9, C55.9), and identified by the malignant tissue type defined as carcinosarcoma according to the "ICD-O-3 Histology/Behavior" codes (8950/3, 8951/3, 8980/3, 8981/3).For each patient, demographic and clinical variables were extracted, encompassing age, race, marital status, median family income, rural–urban continuum code, time to diagnosis, interval from diagnosis to treatment initiation, tumor dimensions, peritoneal cytology results, histological grade and TNM staging (AJCC 7th edition). Assessment of distant metastasis was conducted using the combinatorial staging data within the SEER database.Inclusion criteria comprised: histopathologically confirmed disease, single primary tumors, and diagnoses made between 2010 and 2021. Patients were excluded if they had missing data regarding peritoneal cytology and distant metastasis. Ultimately, a cohort of 3,434 endometrial cancer patients met the criteria for detailed analysis.This study adheres to the Declaration of Helsinki principles. Given that the SEER data are de-identified and available for research purposes, local ethics committee approval was not required.

Data processing

In this study, any data entries with more than 35% missing parameters were excluded from the analysis. The remaining features underwent preprocessing through multiple imputation (MI) facilitated by a multi-classification regression model [22, 23]. Patient data were randomly partitioned into a training set and an internal test set at a ratio of 7:3, where the former was utilized for model development and the latter served for validation and evaluation. Continuous variables are reported as mean ± standard deviation (SD) if normally distributed, as assessed by the Kolmogorov–Smirnov test, or as median (interquartile range) if not, with comparisons made using the t-test or Mann–Whitney U test, respectively. Categorical variables are summarized as counts and frequencies, with comparisons conducted via chi-square or Fisher's exact tests. All statistical analyses were two-tailed, with P < 0.05 denoting statistical significance.To address the imbalance in the dataset due to the low incidence of distant metastases, we applied two techniques during the machine learning phase: resampling and weighted processing. To address class imbalance, we used the SMOTENC method [24] to increase minority class samples in the training set and retained original data for the test set to assess model generalization. For weighting, we assigned higher weights to minority class samples and lower weights to majority ones, based on the inverse of their proportions, [25] ensuring the model's attention to the minority class during training while allowing unbiased evaluation on an untouched test set. Both approaches were compared against untreated data post-modeling to evaluate their effectiveness.

Factor screening

To evaluate the correlations among features in the training set, we applied Spearman correlation analysis, with a correlation coefficient threshold set at 0.7. A coefficient below this threshold suggests an absence of significant multicollinearity among characteristic variables.Initially, we generated a correlation heatmap based on the Spearman coefficients to visualize the degree of association between each pair of variables. To delve deeper into the structural relationships among these variables, we conducted cluster analysis using[1-abs(spearman_cor)] as the distance measure. This approach allows for an equal emphasis on both negative and positive correlations, ensuring a balanced evaluation of variable similarity. The clustering results were represented as a dendrogram, highlighting the hierarchical structure of feature relationships.When two or more features exhibit high correlation, they tend to provide similar information. Including all such highly correlated features in a predictive model can unnecessarily increase model complexity without significantly enhancing its performance.Following this, we used the occurrence of distant metastasis as the outcome variable and performed univariate analysis to identify predictors significantly associated with distant metastasis (P < 0.05) within the training set. The variables that met this significance threshold were subsequently included in multivariate logistic regression analysis. The final feature set for the machine learning model was then determined based on the results of this multivariate logistic regression analysis (P < 0.05), ensuring only the most relevant predictors were selected.

Model construction and evaluation

Based on the feature selection methodology described above, we constructed eight distinct machine learning algorithms to develop predictive models for distant metastasis in UCS, including Logistic Regression, Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Neural Network (NeuralNet), RandomForest, K-Nearest Neighbors (KNN), AdaBoost, and LightGBM. Logistic Regression, a linear model suitable for linearly separable features, offers simplicity and interpretability; SVM maximizes the margin between classes by identifying the optimal hyperplane, providing robust classification especially in high-dimensional spaces; [26] GBM iteratively builds weak classifiers to enhance predictive power, capturing nonlinear relationships and interaction effects while optimizing the loss function via gradient descent; [27] Neural Networks emulate the human brain's structure through multiple layers of neurons, enabling the learning of complex features and modeling of nonlinear relationships; Random Forest integrates multiple decision trees to improve stability and accuracy, reducing overfitting and enhancing generalization; [28] KNN, an instance-based learning method, predicts categories by calculating distances between new samples and existing ones; [29] AdaBoost improves predictive ability by iteratively adjusting sample weights, focusing more on misclassified instances; [30] and LightGBM, an efficient gradient boosting framework, accelerates model training using histograms and feature parallelization [31]. During the training process, these algorithms underwent tenfold cross-validation on the training set data to obtain a robust estimate of model performance. In evaluating the performance of the predictive models, we adopted a comprehensive assessment system designed to thoroughly measure both the classification ability and clinical utility of the models. The primary evaluation tool was the area under the receiver operating characteristic curve (ROC-AUC), which assessed the model's overall discrimination capability. Additionally, calibration curves were used to verify the accuracy of the predicted probabilities, ensuring that the model’s predictions were well-calibrated. Decision Curve Analysis (DCA) evaluated the practical application value of the model from the perspective of clinical benefit, providing insights into its real-world applicability. Moreover, we utilized confusion matrices to transparently present the model prediction results, determining the optimal threshold for the test set based on model accuracy. Various performance metrics were computed, including accuracy, sensitivity, specificity, precision, and F1 score, to offer detailed insights into the model's classification performance. This comprehensive approach ensured a thorough understanding of the model's effectiveness and reliability in clinical settings.

$$\text{Accuracy}=\frac{TP+TN}{TP+TN+FP+FN}$$
$$\text{Sensitivity} = \frac{TP}{TP+FN}$$
$$\text{Specificity} = \frac{TN}{TN+FP}$$
$$\text{Precision} = \frac{TP}{TP+FP}$$
$$\text{F}1\hspace{0.25em}\text{Score}=2\times \frac{\text{Precision}\times \text{Sensitivity}}{\text{Precision}+\text{Sensitivity}}$$

These are based on four basic values: true positives (True Positive, TP), false positives (False Positive, FP), true negatives (True Negative, TN), and false negatives (False Negative, FN).

Model interpretation

To enhance the usability and interpretability of our logistic regression model, we utilized a nomogram for intuitive visualization and result interpretation. This tool not only simplifies model application but also clarifies the contribution of each covariate to the overall prediction score, providing an easy method to estimate individual patient probabilities of distant metastasis. The odds ratio (Odds = P / (1-P)) derived from the nomogram reflects the likelihood of distant metastasis relative to no occurrence.Based on comprehensive performance evaluations, the GBM model was identified as the optimal predictor for distant metastasis in UCS patients, with features ranked by their importance. SHapley Additive exPlanations(SHAP) plots were used to visualize feature contributions, allowing for quantitative analysis of each variable's impact on distant metastasis risk [32]. Positive SHAP values indicate risk factors, while negative values suggest protective factors.We provided personalized explanations for two randomly selected patients regarding their likelihood of developing distant metastases based on model predictions.

All statistical analyses were conducted using R software (version 4.4.1), and the corresponding analysis code is available upon request from the authors.

Result

Baseline information and correlation analysis

In this study, we analyzed data from a total of 3,434 UCS patients to investigate the relationship between peritoneal cytology findings and distant metastasis, revealing a significant association (χ2 = 123.45, p < 0.0001) (Table 1)., suggesting that peritoneal cytology may serve as an Important reference indicators for primary clinical screening. Then,patients were divided into a training set and an internal test set in a 7:3 ratio, with baseline characteristics summarized in Table 2; in the training set, statistically significant differences (p < 0.05) were observed between patients with and without distant metastasis regarding peritoneal cytology, differentiation grade, T stage, N stage, time from diagnosis to treatment, tumor size, and age (Table 3). Figure 1 presents the Spearman correlation analysis of various feature metrics in the training set, where a darker color indicates a higher correlation, with a threshold of 0.7 suggesting strong associations (Fig. 1A),

Table 1 Peritoneal cytology and distant metastasis grouped chi-square test
Table 2 Baseline characteristics of the cohort
Table 3 Difference analysis of the training set

The results of the hierarchical clustering indicate that there is no significant multicollinearity among the feature variables in the training set (Fig. 1B).

Fig. 1
figure 1

Multicollinearity: (A) The relationship between the feature indicators in the training set through Spearman correlation analysis, with a strong correlation between variables greater than 0.7. B Hierarchical cluster analysis of [1 - abs(spearman_cor)] as a distance metric, and a strong correlation between variables greater than 0.3

Univariate and multivariable logistic regression

In univariate logistic regression (LR) analysis, positive peritoneal cytology, differentiation grade G3, T stage, N stage, and tumor size were identified as risk factors for distant metastasis in UCS patients (all odds ratios [OR] > 1, 95% confidence intervals [CI] > 1, p < 0.05). Conversely, age and time from diagnosis to treatment were found to be protective factors (OR and 95% CI < 1, p < 0.05).Multivariate LR analysis further revealed that positive peritoneal cytology, T stage, and tumor size remained independent risk factors for distant metastasis (OR and 95% CI > 1, p < 0.05), while age was confirmed as an independent protective factor (OR and 95% CI < 1, p < 0.05) (Table 4).

Table 4 Univariate logistic regression analysis and multivariate logistic regression analysis

Model building and performance evaluation

In constructing a machine learning model to predict distant metastasis, we selected key features based on multifactor logistic regression (LR), including peritoneal cytology status, T stage, tumor size, and age. We enhanced the performance of multiple classification algorithms using tenfold cross-validation. Results demonstrated excellent predictive power across all models, with area under the receiver operating characteristic curve (ROC-AUC) values exceeding 0.7 (Fig. 2). Calibration curves (Fig. 3), decision curve analysis (DCA) (Fig. 4), accuracy, recall (sensitivity) (Table 5), and other metrics also performed well, indicating the effectiveness of these models for predicting distant metastasis.Notably, while the Random Forest model showed strong performance on the training set, significant overfitting was observed in the internal test set, leading us to exclude it as the best candidate. Using AUC as the primary evaluation criterion, Logistic Regression achieved the highest AUC values of 0.882 on the training set and 0.881 on the internal test set, demonstrating robust discrimination. In handling imbalanced datasets, the F1 score is more critical than ROC-AUC. A "baseline model" that always predicts distant metastasis achieved an F1 score of 0.332. Our models significantly outperformed this baseline, with GBM and AdaBoost showing superior F1 scores compared to Logistic Regression. However, due to AdaBoost’s lower recall rate, which could lead to missed diagnoses, we selected the Gradient Boosting Machine (GBM) model as optimal. GBM achieved an F1 score of 0.630, demonstrating superior generalization and clinical applicability. In conclusion, the GBM model was chosen as the optimal model. Feature importance rankings (Fig. 5) highlighted peritoneal cytology as a critical feature for improving model performance in both LR and GBM models. To further validate the model’s performance and ensure transparency, we constructed confusion matrices for both models (Fig. 6). Comparisons revealed that the GBM model performed better at distinguishing cases of distant metastases.

Table 5 Model Performance Index Evaluation
Fig. 2
figure 2

ROC-AUC curves for the training set (A) and the test set (B)

Fig. 3
figure 3

Calibration curves for the training set (A) and the test set (B)

Fig. 4
figure 4

DCA curves for the training set (A) and the test set (B)

Fig. 5
figure 5

Ranking of feature importance for logistic regression model (A) and GBM model (B)

Fig. 6
figure 6

Confusion matrices for logistic regression models (A: training set, B: test set) and GBM models (C: training set, D: test set)

In a "baseline model" that always predicts distant metastasis, the precision is approximately 0.197. Although our developed models significantly outperform this baseline model, to further optimize precision, we thoroughly investigated the impact of data imbalance on model performance and adopted two main strategies: “SMOTE-NC for Synthetic Sampling” and “Adjusting Sample Weights”. Comparing these approaches using the GBM model, our results are summarized in Table 6. While these techniques improved model accuracy on the internal test set, they did so at the expense of sensitivity (recall), which is crucial in clinical settings. For a preliminary screening tool aimed at identifying UCS patients at risk for distant metastasis, missing actual cases of distant metastasis is clinically more serious than reducing overall precision. Therefore, we opted to use the GBM model trained on unbalanced data as the optimal model for practical application, ensuring higher sensitivity and minimizing missed diagnoses. This decision was partly due to our sufficiently large sample size, which helped ensure the accuracy of our results.

Table 6 Comparison of before and after processing of data imbalances

Model interpretability

During the model interpretability processing phase, the logistic regression model was interpreted and applied using Nomogram plots (Fig. 7).The optimal GBM model, we constructed a feature importance ranking based on SHAP values, revealing that in UCS patients, the key factors influencing distant metastasis were T stage, peritoneal cytology, tumor size, and age; in the visualization (Fig. 8B), yellow indicates risk factors for distant metastasis, and purple highlights protective factors. To further elucidate the model’s predictions, we randomly selected two groups of UCS patients, one at high risk and another at low risk of distant metastasis, with Fig. 8A illustrating the low-risk group, exemplified by a patient aged 67 years with a tumor size of 55 mm, T stage beyond the uterus (T3/T4), and negative peritoneal cytology, where negative peritoneal cytology served as an important protective factor against distant metastasis, and Fig. 8C depicting the high-risk group, characterized by a patient aged 62 years with a tumor size of 70 mm, T stage beyond the uterus (T3/T4), and positive peritoneal cytology, which was identified as a critical risk factor for distant metastasis.

Fig. 7
figure 7

Nomogram plot of a logistic regression model

Fig. 8
figure 8

SHAP plot of the GBM model: Figure B uses the GBM machine learning model to rank the importance of the characteristic variables of distant metastasis in UCS patients. In these graphs, yellow represents a variable as a risk factor for distant transfer, while purple represents a variable as a protective factor. Figure A shows the distribution of SHAP values in the low-risk group and C

Discussion

Uterine carcinosarcoma (UCS) is a highly malignant gynecological tumor characterized by complex and aggressive biology, with a propensity for early and distant metastases. Currently, there is a lack of reliable indicators or prediction models to assess the risk of distant metastasis in UCS patients. This study successfully developed and validated a predictive model that incorporates peritoneal cytology features to evaluate distant metastasis risk in UCS patients. The model exhibited strong discrimination and calibration capabilities, highlighting its potential as a valuable clinical tool.

In our study, peritoneal cytology emerged as a critical feature influencing distant metastasis in UCS patients. This is likely because UCS, being a highly malignant tumor, exhibits strong invasive and early metastatic tendencies. When peritoneal cytology results are positive, it indicates that tumor cells have acquired the ability to breach the basement membrane and enter the abdominal cavity, suggesting their invasive biological behavior and potential for distant metastasis [33, 34]. Compared to other invasive diagnostic methods such as laparoscopic biopsy, peritoneal cytology offers a relatively non-invasive approach to rapidly and safely obtain cytological results, thereby minimizing patient discomfort and complications. Unlike imaging examinations, which can be subjective and dependent on the expertise of the imaging physician, equipment quality, and the size of metastatic tumors. Peritoneal cytology results are typically included in routine pathology reports, providing easily accessible and highly standardized data. This reliable data foundation is crucial for constructing and validating predictive models. Additionally, the application of genetic analysis in peritoneal cytology holds significant promise. Advanced technologies such as high-throughput sequencing, liquid biopsy, and multi-omics integration offer more detailed and comprehensive information compared to traditional pathology [35]. These innovations are expected to enhance the prediction of distant metastasis and improve clinical management outcomes, while also opening new avenues for genetic research and personalized medicine.

In our current study, the results also demonstrate that T stage significantly contributes to distant metastasis in UCS patients. Specifically, diagnoses of T3 or T4 stages indicate extensive local invasion beyond the uterus, often involving lymphatic and blood vessels [36]. This facilitates tumor cells entering the circulation, thereby promoting distant metastasis. Additionally, our results show a positive correlation between tumor size and distant metastasis risk in UCS patients [37]. Larger tumors tend to exhibit higher cell proliferation rates and greater aggressiveness, [38] increasing the likelihood of local and distant spread. The hypoxic environment within growing tumors may activate pro-metastatic signaling pathways, [39] further enhancing metastatic potential. These findings underscore the need for more aggressive evaluation and management strategies for high-risk patients, including wider surgical resection and postoperative adjuvant treatments.Interestingly, we observed an inverse but weak relationship between age and distant metastasis in UCS patients. In certain specific tumor types,younger patients' tumors exhibited unique biological properties [40,41,42] such as higher proliferation rates, greater invasiveness, and metastatic ability, [43, 44] potentially due to specific gene mutations or regulatory mechanisms [45]. This suggests that molecular typing distribution and associated biological behaviors may vary with age, highlighting the importance of understanding age-related differences in UCS biology. The 2023 FIGO update on endometrial cancer staging and molecular typing reflects deeper insights into the complexity and potential biological behavior of these tumors [46]. Further research is needed to explore age-related biological differences and their implications for clinical management.

In our current study, some limitations must be acknowledged. First, this is a retrospective analysis that poses challenges such as data quality issues, information bias and selection bias, so future studies should adopt a prospective design to overcome these limitations and provide more reliable data support. Second, the performance of our machine learning model may be affected by geographical and hospital differences, and patient characteristics and treatment patterns may vary significantly between regions and medical institutions. To ensure the robustness and generalization ability of the model, we need more institutions to participate in the external validation. To this end, in follow-up studies, we plan to incorporate a more diverse and broader multicenter dataset for validation and testing of models, aiming to address potential variations in model performance across different populations and healthcare settings. In addition, it is an important direction for future research to explore the mechanism of occurrence and development through peritoneal cytology examination combined with genomics and proteomics, and to look for more specific and sensitive predictors.

In conclusion, based on the large-scale multicenter data set, our prediction model provides new ideas and technical support for distant metastasis prediction of USC patients. In practice, the prediction results of this model can help develop personalized follow-up plan, especially for those patients predicted to be at high risk of metastasis, suggesting more frequent and targeted monitoring measures to ensure early detection of potential problems and timely intervention treatment. This not only improves the quality of life of patients, but also enables the more reasonable allocation of medical resources, and realizes the maximum efficiency of medical services.

Conclusion

This study introduces the first effective tool for predicting distant metastasis in uterine carcinosarcoma patients by integrating peritoneal cytology features into model construction. It aids in early identification of high-risk patients, enhancing follow-up and monitoring during tumor development, and supports the optimization of personalized treatment strategies.

Data availability

Data is provided within the manuscript or supplementary information files.

Abbreviations

UCS:

Uterine Carcinosarcoma

SEER:

Surveillance, Epidemiology, and End Results Program

SMOTENC:

Synthetic Minority Over-sampling Technique for Nominal and Continuous

SVM:

Support Vector Machine

GBM:

Gradient Boosting Machine

KNN:

K-Nearest Neighbors

LR:

Logistic regression

ROC:

Receiver Operating Characteristic

AUC:

Area Under the Curve

DCA:

Decision Curve Analysis

SHAP:

SHapley Additive exPlanations

FIGO:

International Federation of Gynecology and Obstetrics

ESMO:

European Society of Medical Oncology

ESGO:

European Society of Gynaecological Oncology

ESTRO:

European Society for Radiotherapy & Oncology

JSGO:

Japanese Society of Gynecologic Oncology

NCCN:

National Comprehensive Cancer Network

AJCC:

American Joint Committee on Cancer

SD:

Standard deviation

References

  1. Matsuo K, Ross MS, Machida H, Blake EA, Roman LD. Trends of uterine carcinosarcoma in the United States. J Gynecol Oncol. 2018;29(2):e22.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Pradhan TS, Stevens EE, Ablavsky M, Salame G, Lee YC, Abulafia O. FIGO staging for carcinosarcoma: can the revised staging system predict overall survival? Gynecol Oncol. 2011;123(2):221–4.

    Article  PubMed  Google Scholar 

  3. Ravishankar P, Smith DA, Avril S, Kikano E, Ramaiya NH. Uterine carcinosarcoma: a primer for radiologists. Abdom Radiol (NY). 2019;44(8):2874–85.

    Article  PubMed  Google Scholar 

  4. Bansal N, Herzog TJ, Seshan VE, Schiff PB, Burke WM, Cohen CJ, Wright JD. Uterine carcinosarcomas and grade 3 endometrioid cancers: evidence for distinct tumor behavior. Obstet Gynecol. 2008;112(1):64–70.

    Article  PubMed  Google Scholar 

  5. de Jong RA, Nijman HW, Wijbrandi TF, Reyners AK, Boezen HM, Hollema H. Molecular markers and clinical behavior of uterine carcinosarcomas: focus on the epithelial tumor component. Mod Pathol. 2011;24(10):1368–79.

    Article  PubMed  Google Scholar 

  6. Gracia M, Yildirim Y, Macuks R, Mancari R, Achimas-Cadariu P, Polterauer S, Iacoponi S, Zapardiel I. SARCUT Study Group Influence of Clinical and Surgical Factors on Uterine Carcinosarcoma Survival. Cancers (Basel). 2023;15(5):1463.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Koh WJ, Abu-Rustum NR, Bean S, Bradley K, Campos SM, Cho KR, Chon HS, Chu C, Cohn D, Crispens MA, Damast S, Dorigo O, Eifel PJ, Fisher CM, Frederick P, Gaffney DK, George S, Han E, Higgins S, Huh WK, Lurain JR 3rd, Mariani A, Mutch D, Nagel C, Nekhlyudov L, Fader AN, Remmenga SW, Reynolds RK, Tillmanns T, Ueda S, Wyse E, Yashar CM, McMillian NR, Scavone JL. Uterine Neoplasms, Version 1.2018, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw. 2018;16(2):170–99.

    Article  PubMed  Google Scholar 

  8. Rungruang B, Olawaiye AB. Comprehensive surgical staging for endometrial cancer. Rev Obstet Gynecol. 2012;5(1):28–34.

    PubMed  PubMed Central  Google Scholar 

  9. Pecorelli S. Revised FIGO staging for carcinoma of the vulva, cervix, and endometrium. Int J Gynaecol Obstet. 2009;105(2):103–4.

    Article  PubMed  Google Scholar 

  10. Tebeu PM, Popowski Y, Verkooijen HM, Bouchardy C, Ludicke F, Usel M, Major AL. Positive peritoneal cytology in early-stage endometrial cancer does not influence prognosis. Br J Cancer. 2004;91(4):720–4.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Saga Y, Imai M, Jobo T, Kuramoto H, Takahashi K, Konno R, Ohwada M, Suzuki M. Is peritoneal cytology a prognostic factor of endometrial cancer confined to the uterus? Gynecol Oncol. 2006;103(1):277–80.

    Article  PubMed  Google Scholar 

  12. Matsuo K, Klar M, Harter P, Miller H, Nusbaum DJ, Matsuzaki S, Roman LD, Wright JD. Trends in peritoneal cytology evaluation at hysterectomy for endometrial cancer in the United States. Gynecol Oncol. 2021;161(3):710–9.

    Article  PubMed  Google Scholar 

  13. National Comprehensive Cancer Network (NCCN). NCCN Clinical Practice Guidelines in Oncology.Uterine Neoplasms Version 1 2022. Fort Washington, PA: NCCN; 2021. Available from: https://www.nccn.org/professionals/physician_gls/pdf/uterine.pdf.

  14. Colombo N, Creutzberg C, Amant F, et al. ESMO-ESGO-ESTRO Consensus Conference On Endometrial Cancer: diagnosis, treatment and follow-up. Ann Oncol. 2016;27(1):16–41.

    Article  CAS  PubMed  Google Scholar 

  15. Yamagami W, Mikami M, Nagase S, et al. Japan Society of Gynecologic Oncology 2018 guidelines for treatment of uterine body neoplasms. J Gynecol Oncol. 2020;31(1):e18.

    Article  PubMed  Google Scholar 

  16. Amant F, Mirza MR, Koskas M, Creutzberg CL. Cancer of the corpus uteri. Int J Gynaecol Obstet. 2018;143(Suppl 2):37–50.

    Article  PubMed  Google Scholar 

  17. Concin N, Matias-Guiu X, Vergote I, et al. ESGO/ESTRO/ESP guidelines for the management of patients with endometrial carcinoma. Int J Gynecol Cancer. 2021;31(1):12–39.

    Article  PubMed  Google Scholar 

  18. Matsuo K, Matsuzaki S, Nusbaum DJ, et al. Significance of Malignant Peritoneal Cytology on Survival of Women with Uterine Sarcoma. Ann Surg Oncol. 2021;28(3):1740–8.

    Article  PubMed  Google Scholar 

  19. Matsuo K, Nusbaum DJ, Matsuzaki S, et al. Malignant peritoneal cytology and increased mortality risk in stage I non-endometrioid endometrial cancer. Gynecol Oncol. 2020;159(1):43–51.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Sakai K, Yamagami W, Takahashi F, et al. Prognostic impact of peritoneal cytology on treating endometrial cancer using data from the Japan Society of Obstetrics and Gynecology cancer registry. J Gynecol Oncol. 2024;36:e41.

  21. Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) SEER*Stat Database: Incidence - SEER Research Data, 17 Registries, Nov 2023 Sub (2000-2021) - Linked To County Attributes - Time Dependent (1990-2022) Income/Rurality, 1969-2022 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2024, based on the November 2023 submission.

  22. Rui F, Yeo YH, Xu L, Zheng Q, Xu X, Ni W, Tan Y, Zeng QL, He Z, Tian X, Xue Q, Qiu Y, Zhu C, Ding W, Wang J, Huang R, Xu Y, Chen Y, Fan J, Fan Z, Qi X, Huang DQ, Xie Q, Shi J, Wu C, Li J. Development of a machine learning-based model to predict hepatic inflammation in chronic hepatitis B patients with concurrent hepatic steatosis: a cohort study. EClinicalMedicine. 2024;16(68):102419.

    Article  Google Scholar 

  23. Barakat MS, Field M, Ghose A, et al. The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance. Health Inf Sci Syst. 2017;5(1):16.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Fonseca J, Bacao F. Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications. 2023;234:121053.

    Article  Google Scholar 

  25. Yang K, Yu Z, Chen CLP, Cao W, You J, Wong H-S. Incremental Weighted Ensemble Broad Learning System for Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering. 2022;34(12):5809–24.

    Article  Google Scholar 

  26. Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer Genomics Proteomics. 2018;15(1):41–51.

    CAS  PubMed  Google Scholar 

  27. Dash TK, Chakraborty C, Mahapatra S, Panda G. Gradient boosting machine and efficient combination of features for speech-based detection of COVID-19. IEEE J Biomed Health Inform. 2022;26(11):5364–71.

    Article  PubMed  Google Scholar 

  28. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  29. Lu X, Chen Y, Zhang G, Zeng X, Lai L, Qu C. Application of interpretable machine learning algorithms to predict acute kidney injury in patients with cerebral infarction in ICU. J Stroke Cerebrovasc Dis. 2024;33(7):107729.

    Article  PubMed  Google Scholar 

  30. Natras R, Soja B, Schmidt M. Ensemble Machine Learning of Random Forest, AdaBoost and XGBoost for Vertical Total Electron Content Forecasting. Remote Sens. 2022;14:3547.

    Article  Google Scholar 

  31. Chae M, Yoon H, Lee H, Choi J. Hearing Recovery Prediction for Patients with Chronic Otitis Media Who Underwent Canal-Wall-Down Mastoidectomy. J Clin Med. 2024;13(6):1557.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Progr Biomed. 2022;214:106584.

  33. Takenaka M, Kamii M, Iida Y, Yanaihara N, Suzuki J, Takahashi K, Yanagida S, Saito M, Takano H, Yamada K, Okamoto A. Re-thinking the prognostic significance of positive peritoneal cytology in endometrial cancer. Gynecol Oncol. 2021;161(1):135–42.

    Article  PubMed  Google Scholar 

  34. Matsuo K, Matsuzaki S, Miller H, et al. Clinico-pathological significance of suspicious peritoneal cytology in endometrial cancer. J Surg Oncol. 2021;124(4):687–98.

    Article  PubMed  Google Scholar 

  35. Villiger AS, Zurbriggen S, Imboden S, Solass W, Christe L, Saner FAM, Gmür A, Rau TT, Mueller MD, Siegenthaler F. Reviving peritoneal cytology: Exploring its role in endometrial cancer molecular classification. Gynecol Oncol. 2024;182:148–55.

    Article  CAS  PubMed  Google Scholar 

  36. Jónsdóttir B, Marcickiewicz J, Borgfeldt C, Bjurberg M, Dahm-Kähler P, Flöter-Rådestad A, Hellman K, Holmberg E, Kjølhede P, Rosenberg P, Tholander B, Åvall-Lundqvist E, Stålberg K, Högberg T. Preoperative and intraoperative assessment of myometrial invasion in endometrial cancer-A Swedish Gynecologic Cancer Group (SweGCG) study. Acta Obstet Gynecol Scand. 2021;100(8):1526–33.

    Article  PubMed  Google Scholar 

  37. Gracia M, Yildirim Y, Macuks R, Mancari R, Achimas-Cadariu P, Polterauer S, Iacoponi S, Zapardiel I. SARCUT Study Group Influence of Clinical and Surgical Factors on Uterine Carcinosarcoma Survival. Cancers (Basel). 2023;15(5):1463.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Gao J, Ao Y, Wang S, Chen Z, Zhang Y, Ding J, Jiang J. WHO histological classification and tumor size are predictors of the locally aggressive behavior of thymic epithelial tumors. Lung Cancer. 2024;187:107446.

    Article  CAS  PubMed  Google Scholar 

  39. Baek S, Yu SE, Deng YH, Lee YJ, Lee DG, Kim S, Yoon S, Kim HS, Park J, Lee CH, Lee JB, Kong HJ, Kang SG, Shin YM, Sung HJ. Quenching Epigenetic Drug Resistance Using Antihypoxic Microparticles in Glioblastoma Patient-Derived Chips. Adv Healthc Mater. 2022;11(8):e2102226.

    Article  PubMed  Google Scholar 

  40. Chen MT, Sun HF, Zhao Y, Fu WY, Yang LP, Gao SP, Li LD, Jiang HL, Jin W. Comparison of patterns and prognosis among distant metastatic breast cancer patients by age groups: a SEER population-based analysis. Sci Rep. 2017;7(1):9254.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Huang H, Xu S, Wang X, Liu S, Liu J. Patient Age Is Significantly Related to Distant Metastasis of Papillary Thyroid Microcarcinoma. Front Endocrinol (Lausanne). 2021;12:748238.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Purushotham A, Shamil E, Cariati M, Agbaje O, Muhidin A, Gillett C, Mera A, Sivanadiyan K, Harries M, Sullivan R, Pinder SE, Garmo H, Holmberg L. Age at diagnosis and distant metastasis in breast cancer – A surprising inverse relationship. European Journal of Cancer. 2014;50(10):1697–705.

    Article  CAS  PubMed  Google Scholar 

  43. Azim HA Jr, Nguyen B, Brohée S, Zoppoli G, Sotiriou C. Genomic aberrations in young and elderly breast cancer patients. BMC Med. 2015;13:266.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Buza N, Baine I, Hui P. Precision genotyping diagnosis of lung tumors with trophoblastic morphology in young women. Mod Pathol. 2019;32(9):1271–80.

    Article  CAS  PubMed  Google Scholar 

  45. Cuicui ZHAO, Hong LIU. Molecular Biological Pathogenesis of Young Breast Cancer[J]. Cancer Research on Prevention and Treatment. 2020;47(3):213–7.

    Google Scholar 

  46. Matias-Guiu X, Lax S, Raspollini MR, Palacios J, Zheng W, Liu C, de Brot L, Lordello L, Hardisson D, Gaffney D, Mutch D, Scambia G, Creutzberg CL, Fotopoulou C, Berek JS, Concin N. FIGO 2023 staging for endometrial cancer, when, if it is not now? Eur J Cancer. 2024;213:115115.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We extend our gratitude to the participants and researchers involved in the SEER database and all related open data and studies, whose contributions have been instrumental to our research.

Conflict of interest

The authors declare no competing interests.

Funding

This study was funded by Fujian Provincial Science and Technology Innovation Joint Fund Project(No. 2023Y9454), Fujian Provincial Natural Science Foundation Project(No. 2024J011087) and Fujian Provincial Health Commission Science and Technology Plan Project(No. 2024CXA032).

Author information

Authors and Affiliations

Authors

Contributions

All authors made substantial contributions to the interpretation of data and critically revised the manuscript. All authors approved the final version submitted and agreed to be accountable for their contributions. Specific contributions are as follows: Qiaoming Lin: Writing—original draft, review & editing, data analysis, visualization, methodology, conceptualization. Qi Guan: Writing—original draft, methodology, investigation, data curation. Danru Chen: Software, methodology, validation, investigation. Lilan Li:formal analysis, original draft.Yibin Lin: Writing—review & editing, validation, supervision, project administration, data curation, conceptualization. All authors contributed to the revision of the paper, interpreted the results, read, and approved the final manuscript. The submitting author reports no conflicts of interest.

Corresponding author

Correspondence to Lin Yibin.

Ethics declarations

Ethics approval and consent to participate

This study utilized publicly available abstracted data, and therefore did not require additional ethical approval.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Q., Guan, Q., Chen, D. et al. Peritoneal cytology predicting distant metastasis in uterine carcinosarcoma: machine learning model development and validation. World J Surg Onc 23, 167 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12957-025-03771-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12957-025-03771-9

Keywords