- Research
- Open access
- Published:
Constructing and identifying an eighteen-gene tumor microenvironment prognostic model for non-small cell lung cancer
World Journal of Surgical Oncology volume 22, Article number: 319 (2024)
Abstract
Background
The tumor microenvironment (TME) plays a crucial role in tumorigenesis and tumor progression. This study aimed to identify novel TME-related biomarkers and develop a prognostic model for patients with non-small-cell lung cancer (NSCLC).
Methods
After downloading and preprocessing data from The Cancer Genome Atlas (TCGA) data portal and Gene Expression Omnibus (GEO) datasets, we classified the molecular subtypes using the “NMF” R package. We performed survival analysis and quantified immune scores between clusters. A Cox proportional hazards model was then constructed, and its formula was produced. We assessed model performance and clinical utility. A prediction nomogram was also constructed and validated. Additionally, we explored the potential regulatory mechanisms of our TME gene signature using Gene Set Enrichment Analysis (GSEA).
Results
From data processing and univariate Cox regression analysis, 57 TME-related prognostic genes were identified, and two significantly distinct clusters were established. Using Cox regression and Lasso regression, an 18-gene TME-related prognostic model was developed. Patients were stratified into high- and low-risk groups based on the risk score, with survival analysis showing that the low-risk group had significantly better outcomes than the high-risk group (P < 0.01). ROC curve analysis demonstrated strong predictive performance, with 1-year, 3-year, and 5-year AUC values ranging from 0.654 to 0.702 across different cohorts. The model accurately predicted survival outcomes across subgroups with varying clinical features, and its predictive accuracy was validated through a nomogram.
Conclusions
We developed a prognostic model based on TME-related genes in NSCLC. Our 18-gene TME signature can effectively predict the prognosis of NSCLC with high accuracy.
Background
Lung cancer, also known as pulmonary carcinoma, is the most common cancer and the leading cause of cancer-related deaths worldwide [1]. Generally, lung cancer is divided into small-cell lung cancer (SCLC) and non-small-cell lung cancer (NSCLC) [2], which account for approximately 15–20% and 80–85% of all lung cancers, respectively [3]. Despite advancements in NSCLC treatment, survival rates and overall cure rates remain low [4]. Therefore, it is essential to identify new biomarkers and understand the molecular mechanisms to improve the prognosis and treatment of NSCLC.
Immunotherapy has revolutionized the treatment of various cancers, including NSCLC [5]. Over the past decade, there has been modest progress in treating advanced NSCLC [6]. Currently, PD-1 monoclonal antibodies such as nivolumab and pembrolizumab have been approved for the treatment of advanced NSCLC, providing better prognostic outcomes compared to traditional chemotherapy [7, 8]. However, the therapeutic efficacy of immunotherapy in patients with lung cancer is significantly influenced by their distinct molecular subtypes [9]. Different molecular subtypes exhibit varying levels of immune cell infiltration, PD-L1 expression, and tumor mutation burden (TMB), which can directly affect the response to immune checkpoint inhibitors [10, 11]. For instance, patients with high TMB or elevated PD-L1 expression tend to show better responses to immunotherapy [11, 12]. Therefore, understanding and investigating the molecular subtypes of lung cancer is crucial for optimizing immunotherapeutic strategies and improving patient outcomes.
The tumor microenvironment (TME) consists of diverse cell populations, signaling molecules, and complex stroma, including the extracellular matrix, vasculature, fibroblasts, and immune cells [13, 14]. The TME plays a crucial role in tumorigenesis and tumor progression [15]. Key cancer hallmarks, such as tumor proliferation, invasion, metastasis, chemotherapeutic resistance, and angiogenesis, are influenced by the cellular components of the TME [16]. Recent advancements in TME research have highlighted its complexity and identified novel therapeutic targets [17]. For instance, a study by Yu et al. demonstrated that inhibiting METTL3 led to a significant enhancement of the inflammatory microenvironment [18]. Another study by Liu et al. reported that endoplasmic reticulum oxidoreductase-1α (ERO1A) mediates an immunosuppressive TME by disrupting the balance between the IRE1α and PERK signaling pathways, thereby weakening the response to PD-1 blockade [19]. These findings underscore the importance of identifying additional TME-related molecules and elucidating their mechanisms to advance lung cancer treatment.
In this study, we developed and validated a gene signature based on TME-related genes by integrating multiple gene expression datasets. Subsequently, we constructed a new nomogram to predict lung cancer outcomes.
Materials and methods
Data download and preprocessing
RNA sequencing data for lung adenocarcinoma (TCGA-LUAD) and lung squamous cell carcinoma (TCGA-LUSC) were obtained from the Cancer Genome Atlas (TCGA) data portal ( GDC Analysis Center (cancer.gov)). Clinical data for LUAD and LUSC (XML files) were also downloaded from the TCGA data portal. A total of 1037 tumor samples and 108 normal samples (TCGA-NSCLC) were included in this study, comprising 594 LUAD samples (535 tumor samples and 59 normal samples) and 551 LUSC samples (502 tumor samples and 49 normal samples). Missing data was minimal and did not significantly affect our analysis. For any missing values in clinical data, we applied listwise deletion. Additionally, external validation datasets with clinical information were downloaded from the NCBI Gene Expression Omnibus (GEO; Home - GEO - NCBI (nih.gov)). A set of 4061 TME-related genes sourced from published literature [20] were utilized for further analysis (Table S1).
Classification of molecular subtypes
Expression levels of the 4061 TME genes from TCGA were subjected to differential expression analysis using the “limma” R package. Univariate survival analysis was conducted using the “survival” R package to identify prognostic genes for NSCLC (p < 0.01) [21]. Molecular subtyping was performed using the “NMF” R package (nonnegative matrix factorization, NMF) with default parameters. Samples were clustered into two groups (C1 and C2) based on biological correlation coefficients and internal features of the gene expression matrix in a 50-iteration process using the NMF method (“brunet”) [22]. Clusters k range from 2 to 10 in number (Figure S1).
The choice of k = 2 clusters was determined based on cophenetic values and residual sum of squares (RSS), and other cluster validation metrics (Figure S2). We evaluated various cluster numbers and found that k = 2 provided the most stable and interpretable results, balancing model complexity and explanatory power. (Figure S2).
Comparison of survival analysis and immune scores between clusters
Survival analysis for overall survival (OS) and progression-free survival (PFS) was conducted using the TCGA database and the Pan-cancer TCGA dataset obtained from the UCSC cancer genome browser. Immune cell enrichment was quantified using the Microenvironment Cell Populations (MCP)-counter algorithm, comparing immune scores between C1 and C2 across various cell types including B cells, CD8 + T cells, cytotoxic lymphocytes, endothelial cells, fibroblasts, monocytic cells, myeloid dendritic cells, neutrophils, NK cells, and T cells using the “MCPcounter” R package [23, 24].
Sample preparation and prognostic model construction
The TCGA samples were partitioned into training and testing cohorts using the “caret” R package [25]. The training cohort facilitated model development, while the testing cohort validated the model. “caret” focuses on simplifying and optimizing models, a standard approach in predictive modeling [20]. After identifying differentially expressed genes (DEGs) associated with the TME, the expression data were integrated with clinical survival information. This process resulted in a final matrix containing 993 samples and 1027 genes. Using the “caret” R package, these 993 samples were randomly partitioned into training and testing cohorts with a 7:3 ratio. To minimize selection bias and address potential confounding effects and multicollinearity, we iteratively performed model training and validation 1000 times [26]. For each iteration, the dataset was randomly split into training and validation sets using the “createDataPartition” function from the “caret” package, ensuring a 7:3 ratio [25]. To further ensure no significant differences in clinical characteristics (age, gender, T stage, M stage, N stage, and histologic diagnosis) between the randomized profiles, we applied Chi-square tests for categorical variables and Student’s t-tests for continuous variables. Additionally, multicollinearity was assessed using the variance inflation factor (VIF), and variables with high collinearity (VIF > 5) were excluded from further analysis to avoid redundant information and ensure model robustness [27].
Univariate Cox regression was conducted using the “survival” package on the training cohorts. To refine the model by reducing the number of genes, we employed lasso regression with the “glmnet” package. LASSO regression, a regularization technique, effectively selects features by shrinking coefficients towards zero [28]. Subsequently, we constructed a Cox proportional hazards model and derived its formula. Based on risk scores, the cohorts were stratified into high-risk and low-risk groups for further analysis.
Prognostic model evaluation and subgroup analysis
The performance of the prognostic model was assessed across four datasets: training cohort, testing cohort, all TCGA samples, and GSE14814 datasets. Kaplan-Meier survival curves were plotted, and significance was assessed using the log-rank test. Time-dependent receiver operating characteristic (ROC) curves, decision curve analysis (DCA), and calibration curves were generated using the “survival”, “survminer”, and “timeROC” R packages [29,30,31]. Subgroup and stratified analyses were conducted based on clinical features extracted from TCGA using the “survival” and “survminer” packages in R. All statistical analyses were performed using R statistical software (version 4.1.0).
Construction and validation of the prediction nomogram
Clinical characteristics and risk scores (RS) were used to construct a nomogram predicting survival probabilities. A nomogram is a user-friendly tool that effectively visualizes the risk model results and provides accurate prognostic predictions [32]. The independent prognostic analysis was performed with the TCGA cohort using the “survival” package. Then, the nomogram was generated by the “regplot” and “rms” packages of R version 4.1.3. Prognostic utility was assessed using the receiver operating characteristic (ROC) curve and decision curve analysis (DCA), which were created using the R packages “timeROC” and “ggDCA”.
Model validation by comparison with previous studies
The performance of our prognostic model was compared with previously reported gene models using survival and ROC curve analyses conducted with the “limma”, “survival”, “survminer”, and “timeROC” packages in R [33]. Concordance index (C-index) calculations were performed using the “survcomp” package to further assess and compare predictive accuracy [34]. C-indices range from 0.5 to 1, where a C-index of 1 indicates perfect prediction accuracy, and a C-index of 0.5 suggests predictions are no better than random chance.
Gene set enrichment analysis (GSEA)
To explore regulatory mechanisms associated with our TME gene signature, functional analysis was conducted using Gene Set Enrichment Analysis (GSEA). The “c2.cp.kegg.v7.4.symbols.gmt” dataset was downloaded from GSEA (http://www.gsea-msigdb.org/gsea/index.jsp), and subsequent analyses were performed using the “limma”, “org.Hs.eg.db”, “clusterProfiler”, and “enrichplot” packages in R version 4.1.3.
Results
Identification of molecular subtypes and different patterns of immune scores in NSCLC
After extracting the expression levels of 4061 TME-related genes from the TCGA expression profile, 1418 DEGs were identified and visualized using volcano plots and heatmaps with “ggplot2” and “pheatmap” in R version 4.1.3 (Figures S3 and S4). By intersecting the TCGA and GSE14814 datasets, 11,737 common genes were identified, which were then used to intersect with the 1418 TME-related DEGs. After merging with clinical survival data, a matrix comprising 1027 genes and 993 samples was obtained for further analysis. Univariate Cox regression analysis identified 57 TME-related prognostic genes, which were used to build two clusters (C1 and C2) as previously described (Fig. 1a). TCGA-NSCLC samples were divided into C1 and C2 groups (Table S2).
Comparison of Two Clusters (C1 and C2): a. Consensus map showing clustering via NMF algorithm (k = 2). b. Overall survival (OS) comparison between C1 and C2 (P < 0.001). c. Progression-free survival (PFS) comparison between C1 and C2 (P = 0.355). Immune scores comparison between C1 and C2 for various cell types: B cells, CD8 + T cells, cytotoxic lymphocytes, endothelial cells, fibroblasts, monocytic cells, myeloid dendritic cells, neutrophils, NK cells, and T cells
Kaplan-Meier analysis revealed significant differences in overall survival (OS) between the groups (Fig. 1b, P < 0.001), though disease-free survival (DFS) did not differ significantly (Fig. 1c, P = 0.355). Immune scores, including cytotoxic lymphocytes, endothelial cells, monocytic cells, myeloid dendritic cells, neutrophils, and T cells, differed significantly between the subtypes. However, no differences were observed in B cells, CD8 + T cells, fibroblasts, and NK cells (Fig. 1d-m).
Construction of a prognostic model based on TME-related genes
The training dataset included 697 samples, and the testing dataset included 296 samples. To ensure group randomization, chi-squared tests were conducted for several clinical characteristics in both cohorts. The results showed no significant differences in age (p = 0.6337), gender (p = 0.1717), T stage (p = 0.8846), M stage (p = 0.0631), N stage (p = 0.7485), and histologic diagnosis (p = 0.2165) between the two cohorts (Table 1).
Univariate Cox analysis on the training dataset identified 72 prognostic TME-related genes (p < 0.05). Using lasso regression, this number was reduced to 25 genes. The optimal model performance was achieved when log lambda was − 4.25 (Fig. 2a-b). Multivariate Cox regression analysis further reduced this to 18 genes. The prognostic model formula is:
Construction and Validation of Prognostic Models: a. Log value of independent variable lambda for lasso regression. b. Trajectory of each independent variable during lasso regression. c. Kaplan-Meier (KM) curves for TCGA training cohort. (d) Kaplan-Meier (KM) curves for TCGA testing cohort. (e) Kaplan-Meier (KM) curves for all TCGA samples. (f) Kaplan-Meier (KM) curves for GSE14814 datasets
Risk Score (RS)=(HHIPL2\(\:\times\:\)0.0899) +(SOX9\(\:\times\:\)0.0870) +(DDX11\(\:\times\:\)0.2085) +(WNT7A\(\:\times\:\) 0.1330) + (WNT10B\(\:\times\:\) 0.1586) - (GNG7\(\:\times\:\)0.3403) - (SIX1\(\:\times\:\)0.2060) -(FZD3\(\:\times\:\)0.3038) + (PABPC1\(\:\times\:\)0.2631)- (FOXM1\(\:\times\:\)0.2148) + (ALDOA\(\:\times\:\)0.2598)- (CEP55\(\:\times\:\)0.2290) + (INTS7\(\:\times\:\)0.5316) + (FADD\(\:\times\:\)0.2216) +(CDC25C\(\:\times\:\)0.2940) -(CD69\(\:\times\:\)0.1723) +(EPHB2\(\:\times\:\)0.1084) + (PTGIS\(\:\times\:\)0.3527)
Patients from the training cohort, testing cohort, all TCGA samples, and GSE14814 datasets were divided into low-risk and high-risk groups according to the median cutoff value of the risk score. Survival analysis showed that low-risk groups had significantly better prognoses than high-risk groups across all datasets (TCGA-Train P < 0.01, TCGA-Test P < 0.01, TCGA-All P < 0.01, GEO P < 0.05) (Fig. 2c-f).
ROC curve analysis showed the 1-year, 3-year, and 5-year AUCs for TCGA-train were 0.685, 0.702, and 0.671, respectively; for TCGA-test, they were 0.674, 0.690, and 0.658; and for GSE14814, they were 0.654, 0.656, and 0.587 (Fig. 3a-d).
To validate the model’s accuracy, ROC curve analysis was performed using the area under the curve (AUC) as the assessment metric. As shown in Fig. 3a-b, the 1-year, 3-year and 5-year AUCs of the TCGA-train and TCGA-test cohort were 0.685, 0.702, 0.671 and 0.634, 0.666, 0.636, respectively. In the TCGA-all cohort, the 1-year, 3-year and 5-year AUCs was 0.674, 0.690 and 0.658(Fig. 3c). Additionally, the 1-year, 3-year and 5-year AUCs of the GSE14814 datasets was 0.654, 0.656, 0.587, respectively (Fig. 3d).
Assessment of the prognostic model in subgroups with different clinical features
Samples were divided into subgroups based on age (≤ 60 years, > 60 years), gender (female, male), pathology type (adenocarcinoma, squamous cell carcinoma), M stage (M0, M1), T stage (T1, T2, T3, T4), N stage (N0, N1, N2, N3), and TNM stage (Stage I, II, III, IV). Concurrently, the high-risk and low-risk groups were generated according to the median cutoff value of the risk score. As shown in Fig. 4, statistically significant differences between the two groups were determined in most subgroups. There were no differences between groups in Stage M1, T4, N3 and Stage III (Fig. 4h, l, p and t), which might relate to the small sample sizes. The above results suggest that the prognostic model was able to predict survival outcome in subgroups presenting different clinical features.
Predictive value of the 18-gene prognostic model
To further evaluate the predictive value of the 18 gene prognostic model, we performed the univariable and multivariable Cox regression analyses using the TCGA data. Results of the univariable Cox regression analyses are presented in Table 2. Our initial analysis suggests that TNM Stage, T Stage, M Stage, N Stage and RS were all independent risk factors (p < 0.05), while age, gender and histologic diagnosis indicate no significance (p > 0.05). The multivariate analysis confirmed RS as an independent risk factor with HR = 1.4 (p < 0.05). A nomogram was constructed based on these variables (Fig. 5a). Each variant was scored, and a final score was calculated. The total points correspond to the projected 1-, 3-, and 5-year overall survival rates on the scales below.
Predictive Value of the 18 TME-Related Genes Prognostic Model: a. Nomogram predicting 1-, 3-, and 5-year overall survival (OS) probabilities. b. Calibration curve comparing predicted and actual 1-, 3-, and 5-year OS. c-d. ROC curves for 1- and 3-year OS comparing the nomogram with other clinical variables. e-f. Decision curve analysis (DCA) curves evaluating clinical benefit for 1- and 3-year OS
Calibration curves for 1-, 3-, and 5-year OS estimates demonstrated good model calibration, closely aligning with the ideal prediction diagonal [35] (Fig. 5b). To further validate the nomogram’s predictive accuracy, its 1- and 3-year ROC curves were compared to other clinical variables, including RS, age, gender, TNM stage, T stage, M stage, and N stage. As shown in Fig. 5c-d, the nomogram’s 1-year AUC of 0.692 and 3-year AUC of 0.705 were higher than those of other variables, indicating better accuracy. Additionally, decision curve analysis (DCA) was performed to evaluate the model’s utility (Fig. 5e-f). The x-axis represents the risk threshold, and the y-axis represents net benefit. In this context, “none” assumes all samples are negative, while “all” assumes all samples are positive. The curve furthest from the “all” line indicates the highest prognostic accuracy. In our study, the 1- and 3-year nomogram curves demonstrated the highest accuracy compared to other variables.
The 18-gene prognostic model as a new prognostic model in NSCLC
To further evaluate the prognostic value of our 18-gene signature compared to previously reported models, we included four additional risk models in our study: an eight-gene model [36], a six-gene model [37], and two four-gene models [38, 39]. After extracting the gene lists, we calculated the risk scores (RS) for each of the four gene signatures using the TCGA-all dataset. Subsequently, samples were divided into low-risk and high-risk groups based on the median RS. Survival curves and ROC curves were then plotted using the identical methods. As shown in Figs. 6a-d, the predicted low-risk group consistently had better survival outcomes than the high-risk group across all four gene signatures, with statistically significant differences (P < 0.05). ROC curves in Fig. 6e-h show that the AUC values of the four reported models were lower compared to our model, suggesting superior predictive accuracy of the 18-gene signature .
Comparison of the 18-Gene Risk Model with Other Reported Models: Survival curves (a-d) and ROC curves (e-h) comparing our TME-related gene signature with four other reported gene-signatures. i. Concordance index (C-index) comparison of the four other reported gene-signatures and our TME-related gene signature. j. Restricted mean survival (RMS) time curve comparison of the four other reported gene-signatures and our TME-related gene signature
To further assess and compare the performance of our model with other reported gene signatures, we calculated the C-index. The TME signature demonstrated the highest C-index (C-index = 0.646), as shown in Fig. 6i. The prognostic prediction efficacy at different time points was evaluated using the RMS time. Our TME signature exhibited the best predictive performance over five years compared to the other four models (Fig. 6j). These findings indicate that our 18-gene TME signature can predict NSCLC prognosis with high accuracy.
Potential mechanisms of the 18-gene TME signature
To further inquiry the potential mechanisms of our 18-gene TME signature, GSEA analysis of the high and low-risk group was performed (Fig. 7). We found the following pathways were enriched extremely in high-risk group: CELL CYCLE (NES = 2.424093156, q = 6.67E-08); DNA REPLICATION (NES = 2.221993401, q = 0.000375682); FOCAL ADHESION (NES = 1.771479233, q = 0.001881526); MELANOMA (NES = 2.040675565, q = 0.000762428); PATHWAYS IN CANCER (NES = 1.753271013, q = 0.000557878). Meanwhile, the following pathways were enriched extremely in low-risk group: ALLOGRAFT REJECTION (NES=-1.736225789, q = 0.020058683); ASTHMA (NES=-2.076787601, q = 0.000385714); ETHER LIPID METABOLISM (NES=-1.698555353, q = 0.04501316); INTESTINAL IMMUNE NETWORK FOR IGA PRODUCTION (NES=-1.937898798, q = 0.001881526); SYSTEMIC LUPUS ERYTHEMATOSUS (NES=-2.181660497, q = 5.35E-08).
Discussion
Genetic alterations are closely associated with cancer development, and molecular changes significantly influence cancer prognosis [40]. In recent years, numerous molecular alterations have been identified and characterized, such as FGFR1 amplifications, BRAF mutations, ROS1 rearrangements, HER2 insertions, DDR2 mutations, PIK3CA mutations, and RET rearrangements [41,42,43,44,45,46]. Targeted therapies have significantly improved the prognosis of lung cancer patients and have the potential to transform cancer into a chronic condition. Therefore, the search for prognostic biomarkers remains crucial.
In this study, we identified 57 tumor microenvironment (TME)-related prognostic genes and established a robust prognostic model for NSCLC based on an 18-gene TME signature. Our model, comprising genes such as HHIPL2, SOX9, DDX11, WNT7A, WNT10B, GNG7, SIX1, FZD3, PABPC1, FOXM1, ALDOA, CEP55, INTS7, FADD, CDC25C, CD69, EPHB2, and PTGIS, was validated across multiple cohorts. Several of these genes have been previously investigated: HHIPL2 is differentially expressed in gastric cancer tissues [47], SOX9 is associated with the prognosis of esophageal and oral squamous cell carcinoma [48, 49], and DDX11 may have an oncogenic role [50]. Additionally, Wnt7A is secreted by aggressive breast tumor cells [51], and WNT10B is critical for gastric cancer progression [52]. GNG7 might be used as a diagnostic and prognostic biomarker for pancreatic adenocarcinoma [53] and FOXM1 had been turned out to be a bane mediating tumorigenesis [54]. While our data mining suggests a close relationship between these genes and the TME, systematic functional validation and mechanistic exploration of these genes are lacking. Future research will focus on these aspects to further elucidate their roles and mechanisms in NSCLC and the TME.To assess the model performance, four data cohorts and different subgroups were incorporated in further analysis. Results suggested that the prognostic model was able to predict survival outcome in different data cohort and subgroups presenting different clinical features. Univariate and multivariate Cox regression analyses confirmed that the risk score (RS) was an independent prognostic factor. To address the potential impact of tumor heterogeneity and intratumoral variations, we validated our model across multiple datasets to minimize the effects of dataset-specific variability. Additionally, we performed subgroup analyses to assess model performance in different patient populations. A nomogram was constructed, a widely used biostatistical tool for predicting disease outcomes [55,56,57]. For instance, Chen et al. developed a nomogram with a collagen signature for predicting peritoneal metastasis in gastric cancer [58], and Niu et al. constructed a prognostic model for ATG patients [59]. Our study shares similarities with these approaches, highlighting the utility of prognostic models in clinical settings while considering tumor heterogeneity.
To assess the potential clinical benefit of our prognostic model, we employed Decision Curve Analysis (DCA) [60]. DCA evaluates the net benefit of a model by considering the trade-offs between true positive and false positive rates across different threshold probabilities [61]. Our analysis revealed that our 18-gene TME signature model consistently provided a higher net benefit compared to standard clinical approaches for predicting 1- and 3-year overall survival (OS). This indicates that the model can enhance clinical decision-making by offering more accurate risk stratification, ultimately supporting better-tailored therapeutic strategies and potentially improving patient outcomes. The higher net benefit of our model suggests that it is a valuable tool for clinicians when considering treatment options for NSCLC patients.
Several prognostic models for lung cancer have been developed, but differences in patient populations and data processing methods across studies need to be considered when comparing models. For instance, Ma et al. created an eight-gene signature (INSL4, SCN7A, STAP1, P2RX1, IKZF3, MS4A1, KLRB1, and ACSM5) for personalized prognosis and therapy in LUAD patients, based on a cohort with different clinical characteristics and treatment backgrounds from our study [36]. Similarly, Song et al. identified four immune-related genes (MAL, MS4A1, OAS1, and WFDC2) using datasets that employed distinct normalization techniques and treatment protocols, which may have influenced the reported predictive accuracy [39]. Sun et al. developed a four-gene signature (ARNTL2, ECT2, PPIA, and TUBA4A) for lung adenocarcinoma basing on a cohort with unique inclusion criteria [38], while a six-gene model (CD200, CHI3L2, CNTN1, CTSL, FYB1, and SLC52A1) was identified as potential biomarkers in advanced NSCLC patients who received therapeutic interventions distinct [37]. In our study, we utilized the TCGA-all dataset, which underwent its own specific preprocessing and standardization. These methodological differences may account for the observed discrepancies in predictive performance between models.
In comparing our 18-gene model to previously reported prognostic models, we observed significant advantages in its prognostic accuracy as evidenced by its higher C-index and AUC values. Specifically, our model achieved a C-index of 0.646, demonstrating robust predictive capability. In contrast, Ma et al.‘s eight-gene model [36] reported a C-index of 0.620, and Song et al.‘s four-gene model [39] exhibited a C-index of 0.605. These findings clearly highlight the superior overall prognostic accuracy of our 18-gene TME signature. Additionally, our model’s performance in ROC curve analysis showed AUC values ranging from 0.654 to 0.702 across various time points, while the AUC values for the other models consistently fell below 0.650 (as shown in Fig. 6e-h). This discrepancy further reinforces the predictive accuracy and clinical utility of our 18-gene model. Therefore, our 18-gene TME signature demonstrated superior prognostic accuracy, as evidenced by its higher C-index and AUC values across multiple time points, suggesting its robustness and potential utility in a variety of clinical settings. These findings highlight the importance of further research to assess how differences in patient populations and data processing techniques might impact the generalizability of prognostic models across different cohorts. Furthermore, our 18-gene TME signature, distinct from previous models, shows high prognostic accuracy, suggesting directions for further investigation.
To further explore the molecular mechanisms of our model, the GSEA analysis were performed. We found several potential mechanisms associated with the 18-TME genes prognostic model. The cell-cycle is a highly conserved, continuous metabolic process regulating duplication of genetic material, cell growth and cell division [62]. DNA replication is fundamental for life activities, which occurs through a series of events of molecular events [63]. Elevated invasiveness is the most important feature of melanoma which is also closely in relation to tumor microenvironment [64]. In this study, MELANOMA is enriched extremely in high-risk group of our model and this is also addressed that melanoma is an immune responsive tumor [65]. It is well known that allograft rejection, asthma and systemic lupus erythematosus (SLE) strongly linked with immune dysfunction. In this study, they were enriched extremely in low-risk group of our model. All the above results confirmed that our 18-gene model had a profound association with immune function.
Despite its strengths, this study has several limitations that should be acknowledged. All analyses were derived from public databases at the transcriptome level, which necessitates further validation at both the mRNA and protein levels to confirm our findings. Additionally, the TME-related genes were sourced from existing literature, which may not be exhaustive and could introduce bias. The generalizability of our prognostic model across different patient populations, including variations in ethnicity and stages of NSCLC, remains to be established. Furthermore, potential biases inherent in utilizing public databases such as TCGA, including differences in patient demographics and treatment histories, may impact the applicability of our model. Finally, many genes within our 18-gene model require further functional validation to elucidate their roles in NSCLC. In summary, while we developed a highly accurate prognostic model for NSCLC, these limitations highlight important directions for future research.
Conclusions
In this study, we developed a prognostic model based on TME-related genes for NSCLC. This model accurately predicted survival outcomes across subgroups with various clinical features. Univariable and multivariable Cox regression analyses confirmed that the RS is an independent risk factor. We then constructed a nomogram to predict 1-, 3-, and 5-year overall survival rates, demonstrating high predictive accuracy compared to other clinical variables. Additionally, we compared our 18-gene signature with four previously reported risk models and found that our model had superior prognostic accuracy for NSCLC. Finally, GSEA analysis provided insights into the potential mechanisms underlying our 18-gene TME signature.
Data availability
No datasets were generated or analysed during the current study.
Abbreviations
- TME:
-
Tumor Microenvironments
- NSCLC:
-
Non-Small-Cell Lung Cancer
- GSEA:
-
Gene Set Enrichment Analysis
- SCLC:
-
Small-Cell Lung Cancer
- LUAD:
-
Lung Adenocarcinoma
- LUSC:
-
Lung squamous cell carcinoma
- TCGA:
-
The Cancer Genome Atlas data portal
- GEO:
-
Gene Expression Omnibus
- NMF:
-
Nonnegative Matrix Factorization
- OS:
-
Overall Survival
- PFS:
-
Progression-Free Survival
- MCP:
-
Microenvironment Cell Populations
- ROC:
-
Receiver Operating Characteristics
- DCA:
-
Decision Curve Analysis
- RS:
-
Risk Score
- DEGs:
-
Differential Expression Genes
References
Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2021. CA Cancer J Clin. 2021;71:7–33.
Tan X, Tong L, Li L, et al. Loss of Smad4 promotes aggressive lung cancer metastasis by de-repression of PAK3 via miRNA regulation. Nat Commun. 2021;12:4853.
Zhan P, Zhang B, Xi GM, et al. PRC1 contributes to tumorigenesis of lung adenocarcinoma in association with the Wnt/beta-catenin signaling pathway. Mol Cancer. 2017;16:108.
Li B, Zhu L, Lu C, et al. circNDUFB2 inhibits non-small cell lung cancer progression via destabilizing IGF2BPs and activating anti-tumor immunity. Nat Commun. 2021;12:295.
Haratani K, Hayashi H, Chiba Y, et al. Association of Immune-related adverse events with Nivolumab Efficacy in Non-small-cell Lung Cancer. JAMA Oncol. 2018;4:374–8.
Gettinger SN, Horn L, Gandhi L, et al. Overall survival and Long-Term Safety of Nivolumab (Anti-programmed Death 1 antibody, BMS-936558, ONO-4538) in patients with previously treated Advanced Non-small-cell Lung Cancer. J Clin Oncol. 2015;33:2004–12.
Herbst RS, Baas P, Kim DW, et al. Pembrolizumab versus Docetaxel for previously treated, PD-L1-positive, advanced non-small-cell lung cancer (KEYNOTE-010): a randomised controlled trial. Lancet. 2016;387:1540–50.
Reck M, Rodriguez-Abreu D, Robinson AG, et al. Pembrolizumab versus Chemotherapy for PD-L1-Positive non-small-cell Lung Cancer. N Engl J Med. 2016;375:1823–33.
Li C, Pan J, Luo J, Chen X. Prognostic characterization of immune molecular subtypes in non-small cell lung cancer to immunotherapy. BMC Pulm Med. 2021;21:389.
Nabet BY, Hamidi H, Lee MC, et al. Immune heterogeneity in small-cell lung cancer and vulnerability to immune checkpoint blockade. Cancer Cell. 2024;42:429–e443424.
Wang Q, Li M, Yang M, et al. Analysis of immune-related signatures of lung adenocarcinoma identified two distinct subtypes: implications for immune checkpoint blockade therapy. Aging. 2020;12:3312–39.
Marabelle A, Fakih M, Lopez J, et al. Association of tumour mutational burden with outcomes in patients with advanced solid tumours treated with pembrolizumab: prospective biomarker analysis of the multicohort, open-label, phase 2 KEYNOTE-158 study. Lancet Oncol. 2020;21:1353–65.
Bader JE, Voss K, Rathmell JC. Targeting metabolism to improve the Tumor Microenvironment for Cancer Immunotherapy. Mol Cell. 2020;78:1019–33.
Seebacher NA, Krchniakova M, Stacy AE, et al. Tumour Microenvironment stress promotes the development of Drug Resistance. Antioxidants (Basel; 2021. p. 10.
Katheder NS, Rusten TE. Microenvironment and tumors-a nurturing relationship. Autophagy. 2017;13:1241–3.
Lee SS, Cheah YK. (2019) The Interplay between MicroRNAs and Cellular Components of Tumour Microenvironment (TME) on Non-Small-Cell Lung Cancer (NSCLC) Progression. J Immunol Res 2019:3046379.
Kang JH, Zappasodi R. Modulating Treg stability to improve cancer immunotherapy. Trends Cancer. 2023;9:911–27.
Yu H, Liu J, Bu X, et al. Targeting METTL3 reprograms the tumor microenvironment to improve cancer immunotherapy. Cell Chem Biol. 2024;31:776–91. e777.
Liu L, Li S, Qu Y, et al. Ablation of ERO1A induces lethal endoplasmic reticulum stress responses and immunogenic cell death to activate anti-tumor immunity. Cell Rep Med. 2023;4:101206.
Zheng M, Long J, Chelariu-Raicu A et al. (2021) Identification of a Novel Tumor Microenvironment Prognostic signature for Advanced-Stage Serous Ovarian Cancer. Cancers (Basel) 13.
Sarkar K, Chowdhury R, Dasgupta A. Analysis of Survival Data: challenges and Algorithm-based model selection. J Clin Diagn Res. 2017;11:LC14–20.
Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics. 2010;11:367.
Becht E, Giraldo NA, Lacroix L, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17:218.
Woolston A, Khan K, Spain G, et al. Genomic and Transcriptomic Determinants of Therapy Resistance and Immune Landscape Evolution during Anti-EGFR Treatment in Colorectal Cancer. Cancer Cell. 2019;36:35–50. e39.
Wu Y, Fu L, Wang B, et al. Construction of a prognostic risk assessment model for lung adenocarcinoma based on integrin beta family-related genes. J Clin Lab Anal. 2022;36:e24419.
Chen X, Gole J, Gore A, et al. Non-invasive early detection of cancer four years before conventional diagnosis using a blood test. Nat Commun. 2020;11:3475.
Christiaens A, Boland B, Germanidis M, et al. Poor health status, inappropriate glucose-lowering therapy and high one-year mortality in geriatric patients with type 2 diabetes. BMC Geriatr. 2020;20:367.
Alhamzawi R, Ali HTM. The bayesian adaptive lasso regression. Math Biosci. 2018;303:75–82.
Luo Y, Zhang G. Identification of a Necroptosis-Related Prognostic Index and Associated Regulatory Axis in kidney renal clear cell carcinoma. Int J Gen Med. 2022;15:5407–23.
Cao X, He J, Chen A et al. (2023) Comprehensive Analysis of Necroptosis Landscape in skin cutaneous melanoma for appealing its implications in Prognosis Estimation and Microenvironment Status. J Pers Med 13.
Tao S, Tao K, Cai X. (2022) Necroptosis-Associated lncRNA Prognostic Model and Clustering Analysis: Prognosis Prediction and Tumor-Infiltrating Lymphocytes in Breast Cancer. J Oncol 2022:7099930.
Zheng M, Mullikin H, Hester A et al. (2020) Development and validation of a Novel 11-Gene prognostic model for Serous Ovarian Carcinomas based on lipid metabolism expression Profile. Int J Mol Sci 21.
Liu Y, Wang L, Fang L, et al. A Multi-center Validated Subtyping Model of Esophageal Cancer based on three metabolism-related genes. Front Oncol. 2021;11:772145.
Shi D, Mu S, Pu F, et al. Integrative analysis of immune-related multi-omics profiles identifies distinct prognosis and tumor microenvironment patterns in osteosarcoma. Mol Oncol. 2022;16:2174–94.
Ferrone CR, Kattan MW, Tomlinson JS, et al. Validation of a postresection pancreatic adenocarcinoma nomogram for disease-specific survival. J Clin Oncol. 2005;23:7529–35.
Ma C, Luo H, Cao J, et al. Identification of a Novel Tumor Microenvironment-Associated eight-gene signature for Prognosis Prediction in Lung Adenocarcinoma. Front Mol Biosci. 2020;7:571641.
Zhang X, Shi X, Zhao H et al. (2021) Identification and Validation of a Tumor Microenvironment-Related Gene Signature for Prognostic Prediction in Advanced-Stage Non-Small-Cell Lung Cancer. Biomed Res Int 2021:8864436.
Sun S, Guo W, Wang Z, et al. Development and validation of an immune-related prognostic signature in lung adenocarcinoma. Cancer Med. 2020;9:5960–75.
Song C, Guo Z, Yu D, et al. A prognostic Nomogram Combining Immune-related gene signature and clinical factors predicts survival in patients with lung adenocarcinoma. Front Oncol. 2020;10:1300.
Gopal RK, Kubler K, Calvo SE, et al. Widespread chromosomal losses and mitochondrial DNA alterations as genetic drivers in Hurthle Cell Carcinoma. Cancer Cell. 2018;34:242–e255245.
Wagner AH, Devarakonda S, Skidmore ZL, et al. Recurrent WNT pathway alterations are frequent in relapsed small cell lung cancer. Nat Commun. 2018;9:3787.
Paik PK, Arcila ME, Fara M, et al. Clinical characteristics of patients with lung adenocarcinomas harboring BRAF mutations. J Clin Oncol. 2011;29:2046–51.
Dziadziuszko R, Krebs MG, De Braud F, et al. Updated Integrated Analysis of the efficacy and safety of Entrectinib in locally Advanced or Metastatic ROS1 Fusion-positive non-small-cell Lung Cancer. J Clin Oncol. 2021;39:1253–63.
Li BT, Michelini F, Misale S, et al. HER2-Mediated internalization of cytotoxic agents in ERBB2 amplified or mutant lung cancers. Cancer Discov. 2020;10:674–87.
Hammerman PS, Sos ML, Ramos AH, et al. Mutations in the DDR2 kinase gene identify a novel therapeutic target in squamous cell lung cancer. Cancer Discov. 2011;1:78–89.
Gautschi O, Milia J, Filleron T, et al. Targeting RET in patients with RET-Rearranged lung cancers: results from the Global, Multicenter RET Registry. J Clin Oncol. 2017;35:1403–10.
Junnila S, Kokkola A, Karjalainen-Lindsberg ML, et al. Genome-wide gene copy number and expression analysis of primary gastric tumors and gastric cancer cell lines. BMC Cancer. 2010;10:73.
Sumita Y, Yamazaki M, Maruyama S, et al. Cytoplasmic expression of SOX9 as a poor prognostic factor for oral squamous cell carcinoma. Oncol Rep. 2018;40:2487–96.
Higo N, Okumura H, Uchikado Y, et al. Expression of SOX9 is related to prognosis in patients with oesophageal squamous cell carcinoma. Vivo. 2018;32:835–8.
Mahtab M, Boavida A, Santos D, Pisani FM. (2021) The Genome Stability maintenance DNA helicase DDX11 and its role in Cancer. Genes (Basel) 12.
Avgustinova A, Iravani M, Robertson D, et al. Tumour cell-derived Wnt7a recruits and activates fibroblasts to promote tumour aggressiveness. Nat Commun. 2016;7:10305.
Wu XD, Bie QL, Zhang B, et al. Wnt10B is critical for the progression of gastric cancer. Oncol Lett. 2017;13:4231–7.
Zhang Y, Yang J, Wang X, Li X. GNG7 and ADCY1 as diagnostic and prognostic biomarkers for pancreatic adenocarcinoma through bioinformatic-based analyses. Sci Rep. 2021;11:20441.
Kalathil D, John S, Nair AS. FOXM1 and Cancer: Faulty Cellular Signaling derails Homeostasis. Front Oncol. 2020;10:626836.
Dabi Y, Willecocq C, Ballester M, et al. Identification of a low risk population for parametrial invasion in patients with early-stage cervical cancer. J Transl Med. 2018;16:163.
Hu SP, Chen L, Lin CY, et al. The Prognostic Value of Preoperative Geriatric Nutritional Risk Index in patients with pancreatic ductal adenocarcinoma. Cancer Manag Res. 2020;12:385–95.
He Y, Mao M, Shi W, et al. Development and validation of a prognostic nomogram in gastric cancer with hepatitis B virus infection. J Transl Med. 2019;17:98.
Chen D, Liu Z, Liu W, et al. Predicting postoperative peritoneal metastasis in gastric cancer with serosal invasion using a collagen nomogram. Nat Commun. 2021;12:179.
Niu X, Yang Y, Zhou X, et al. A prognostic nomogram for patients with newly diagnosed adult thalamic glioma in a surgical cohort. Neuro Oncol. 2021;23:337–8.
Zhang Z, Wang Z, Yan M, et al. Radiomics and Dosiomics signature from whole lung predicts Radiation Pneumonitis: a Model Development Study with prospective external validation and decision-curve analysis. Int J Radiat Oncol Biol Phys. 2023;115:746–58.
Hailemeskel HS, Tiruneh SA. Development of a Nomogram for Clinical Risk Prediction of Preterm Neonate Death in Ethiopia. Front Pediatr. 2022;10:877200.
Suski JM, Braun M, Strmiska V, Sicinski P. Targeting cell-cycle machinery in cancer. Cancer Cell. 2021;39:759–78.
Emerson DJ, Zhao PA, Cook AL, et al. Cohesin-mediated loop anchors confine the locations of human replication origins. Nature. 2022;606:812–9.
Simiczyjew A, Dratkiewicz E, Mazurkiewicz J et al. (2020) The influence of Tumor Microenvironment on Immune escape of Melanoma. Int J Mol Sci 21.
Huang AC, Zappasodi R. A decade of checkpoint blockade immunotherapy in melanoma: understanding the molecular basis for immune sensitivity and resistance. Nat Immunol. 2022;23:660–70.
Funding
This work was supported by the Intramural Fund, Youth Talent Project, (Grant No. Not applicable).
Author information
Authors and Affiliations
Contributions
This study was conceived by Zaishan Li and Baoling Liu. Data collection was conducted by Zhenzhen Meng and Lin Xiao. Data analysis was performed by Jiahui Du and Dazhi Jiang. Baoling Liu drafted the manuscript. All authors critically reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
No ethical approval or patient consent was required for this study because all data used in the research were obtained from publicly available databases.
Consent for publication
Not applicable.
Generative AI and AI-assisted technologies in the writing process
We did not use artificial intelligence or AI-assisted technologies in the writing process.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
12957_2024_3588_MOESM3_ESM.jpg
Supplementary Material 3: Figure S1. Consensus map illustrating clustering results via non-negative matrix factorization (NMF) for clusters k=2-10.
12957_2024_3588_MOESM4_ESM.jpg
Supplementary Material 4: Figure S2. Representation of cophenetic correlation coefficient and rss used to assess stability and performance of NMF-derived clusters.
12957_2024_3588_MOESM6_ESM.jpg
Supplementary Material 6: Figure S4. Heatmaps depicting expression patterns of the 1418 differential expression genes (DEGs).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, Z., Meng, Z., Xiao, L. et al. Constructing and identifying an eighteen-gene tumor microenvironment prognostic model for non-small cell lung cancer. World J Surg Onc 22, 319 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12957-024-03588-y
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12957-024-03588-y