A combination of immune cell types identi ﬁ ed through ensemble machine learning strategy detects altered pro ﬁ le in recurrent pregnancy loss: a pilot study

Objective: To compare the immunologic pro ﬁ les of peripheral and menstrual blood (MB) of women who experience recurrent pregnancy loss and women without pregnancy complications. Design: Explorative case-control study. Cross-sectional assessment of ﬂ ow cytometry-derived immunologic pro ﬁ les. Setting: Academic medical center. Patient(s): Women who experienced more than 2 consecutive miscarriages. Intervention(s): None. Main Outcome Measure(s): Flow cytometry-based immune pro ﬁ les of uterine and systemic immunity (recurrent pregnancy loss, n ¼ 18; control, n ¼ 14) assessed by machine learning classi ﬁ ers in an ensemble strategy, followed by recursive feature selection. Result(s): In peripheral blood, the combination of 4 cell types (nonswitched memory B cells, CD8 þ T cells, CD56bright CD16 (cid:2) natural killer [NKbright] cells, and CD4 þ effector T cells) classi ﬁ ed samples correctly to their respective cohort. The identi ﬁ ed classifying cell types in peripheral blood differed from the results observed in MB, where a combination of 6 cell types (Ki67 þ CD8 þ T cells, (Human leukocyte antigen-DR þ ) regulatory T cells, CD27 þ B cells, NKbright cells, regulatory T cells, and CD24HiCD38Hi B cells) plus age allowed for assigning samples correctly to their respective cohort. Based on the combination of these features, the average area under the curve of a receiver operating characteristic curve and the associated accuracy were > 0.8 for both sample sources. Conclusion(s): A combination of immune subsets for cohort classi ﬁ cation allows for robust identi ﬁ cation of immune parameters with possible diagnostic value. The noninvasive source of MB holds several opportunities to assess and monitor reproductive health. (Fertil Steril Sci (cid:1) 2022;3:166 – 73. (cid:3) 2022 by American Society for Reproductive Medicine.)

O nly 30% of pregnancies progress from conception to live birth (1). Spontaneous pregnancy loss poses a great medical and emotional burden. One percent to 2% of women experience recurrent pregnancy loss (RPL), defined as the consecutive loss of 2 or more pregnancies before 20 weeks of gestation (2,3).
In the absence of embryonal chromosome abnormalities, fractions of RPL can be attributed to genetic (2%-5%), anatomic (10%-15%), endocrine (17%-20%), or autoimmune-related (20%) factors, but over 40% of cases remain unexplained (4,5). These unexplained cases may be linked to an inadequate maternal immune response during key processes of implantation and placentation at the fetalmaternal interface (6)(7)(8). As the mechanisms behind this dysregulation are unknown, it is unclear whether they are contained to the uterus or manifest systemically.
Most investigations to identify dysregulation of immunity in RPL have focused on specific immune cell populations isolated from peripheral blood (PB). This sample source is readily available to study large populations and, because of its limited invasiveness, allows inclusion of large control cohorts. Altered levels of circulating T (helper) cell profiles and regulatory T (Treg) cells and ratios of natural killer (NK) cell subsets were observed in patients with RPL, but none of these cell types were observed individually at levels that allowed robust distinction from women without RPL (9)(10)(11)(12)(13). Large variance, due to interpopulation differences, the time point of sampling or the measurement itself, and the complexity of relationships between immunologic parameters hamper the detection of cohort differences by commonly applied univariate approaches (14). Furthermore, it has to be considered that endometrial and decidual immune cells are highly specialized to enable adequate interaction with trophoblast cells, through mechanisms independent from PB-derived cells (15)(16)(17)(18)(19). Consequently, PB cannot serve to detect dysregulation of uterine immunity (20). Menstrual blood (MB) offers access to uterine immune cells that display tissues-specific markers and NK cell profiles characteristic to decidua (21)(22)(23)(24)(25)(26). This noninvasive sample source is not widely applied until now yet allows large-scale studies to understand reproductive disorders.
Here, we evaluated the immunophenotypic profiles of PB and MB using an ensemble machine learning strategy. Our pilot data shows unbiased multivariate detection of immunologic differences in women experiencing RPL.

Sample Collection
Peripheral blood and MB were collected (between March 2017 and December 2020) from normotensive women aged 18-46 years without known disorders of reproduction and women experiencing RPL, defined as the consecutive (idiopathic) loss of 3 or more pregnancies before 20 weeks of gestation without known causes of miscarriages, such as the presence of antithyroid, antiphospholipid, and antinuclear autoantibodies; endocrine dysfunction; uterine malformation; hemostatic disorder; and abnormal karyotype (2,3). The characteristics of women included in this study are shown in Table 1. The exclusion criteria were the use of immunosuppressive drugs, biologicals, or antidepressants, autoimmune diseases, diabetes mellitus, smoking, human immunodeficiency virus positivity, current use of an intrauterine device or hormonal contraceptives, and irregular menstrual cycles. For intracellular staining, cells were isolated by means of density gradient centrifugation (Lymphoprep; Axis-Shield PoC AS, Oslo, Norway). Menstrual blood was processed as described previously (25,27). Briefly, MB-derived cells were obtained after filtering (70 mm) and granulocyte depletion using RosetteSep (STEMCELL technologies Inc., Vancouver, Canada) followed by density gradient centrifugation (Lymphoprep).

Flow Cytometry
For surface staining, a minimum of 250,000 PB or MB was stained using fluorochrome-conjugated monoclonal antibodies (moAbs) for 20 minutes at room temperature in  Kaluza (v2.1; Beckman Coulter Inc., Brea, CA). Gate settings (Supplemental Fig. 1, available online) were based on a fluorescence minus one strategy.

Data Analysis
Data were processed using R v.4.0.2 and the ggpubr, ggplot2, ggsignif, and tidyr packages. The nonparametric Mann-Whitney U test was used. P values of < .05 were considered statistically significant. An ensemble feature selection was used to detect features allowing for cohort classification. This strategy was previously designed and validated to overcome the bias of using a single machine learning algorithm, thus allowing for a more robust selection of classifying features (28,29). Eight classifiers (bagging, gradient boosting, logistic regression, passive-aggressive regression, random forest, ridge regression, stochastic gradient descent [on linear models], and support vector machine [with a linear kernel] classifiers) were run in 10-fold and used to score features on their importance for classification. Scoring of the individual algorithms was combined in an ensemble ranking: for bagging, gradient boosting, and random forest analysis that work with classification trees, features of the trees' splits were counted and ranked by frequency; for passiveaggressive regression, logistic regression, ridge regression, stochastic gradient descent, and support vector machine classifiers, feature importance was assigned by the coefficients' value associated with each feature. The ranking of each classifier was scored based on times it appeared within the top classifying features. A detailed description of the ranking used for the ensemble strategy has previously been presented (29). To reduce the number of features to the ones that allow for optimal classification, classifiers were run repeatedly with the top 80% features in a recursive feature selection approach. All classifiers were subjected to stratified fivefold and tenfold cross-validation. Having determined which features allow for the most robust classification, the set of parameters was used to run the individual classifying algorithms, combined with 10-fold cross-validation.

RESULTS
As immune responses are never limited to a single cell type, minor changes in the frequency of a specific subset can affect neighboring cells through cell contact or secretion of soluble factors. While a subtle change in the numbers or characteristics of a given immune cell population may fall within a physiological range, hampering its detection, machine learning can identify a change in overall patterns and the underlying cell types involved. To collect a dataset suited for multivariate analysis, we established a phenotypic flow cytometry-based overview of immune cell frequencies (Fig. 1A) of PB and MB (n ¼ 15 and n ¼ 18, respectively) of women who experienced at least 3 consecutive unexplained miscarriages (patient characteristics, Table 1). The analysis covered the total leukocyte populations, T, B, and NK cell subsets using 5 established staining panels (30) (Supplemental Table 1). In total, 63 immune subsets, age, and cytomegalovirus (CMV) status were assessed, which resulted in 65 features that were taken into account for further analysis (Supplemental Fig. 1 and Supplemental Table 2, available online). Data were compared with a control cohort of parous women who did not experience RPL (PB, n ¼ 13; MB, n ¼ 14). We used machine learning-based cohort classification to identify immune cell subsets that discriminate RPL from control, on the basis of either the MB or PB profiles. To achieve this, we employed an ensemble strategy because it allows for robust feature selection in a low-sample-size setting (31). Through combining 8 distinct classification algorithms, the ensemble overcomes any possible bias of its individual classification algorithms (28,29). The outcomes of the individual algorithms were weighted and combined into a single ensemble ranking (29). The 80% top features of this list were then used to run the individual algorithms, including 10-fold cross-validation to ensure generality of the results, and the average classification accuracy was calculated. By repeatedly reducing the list of top-contributing features by 20%, the optimal number of features to achieve robust classification was determined (Fig. 1B) (Fig. 1E), with accuracy values of 0.87 AE 0.16 and 0.84 AE 0.14, respectively (Supplemental Table 3, available online).
Reducing the number of features included in a multivariate approach allows for more robust outcomes as features of high variance, with minor distinctive value, can be excluded. However, this approach may mask notable differences of an individual feature because only the most contributing node in a network of related variables is considered. Thus, we additionally assessed all studied features by classical univariate analysis. No differences in the total leukocyte populations or frequently studied T helper cell subsets were detected (Supplemental Fig. 3). Whereas no altered NK cell subset frequencies were observed in PB of patients with RPL, MB revealed decreased fractions of NKbright cells, and an increased percentage of CD16þCD56þ NKdim cells ( Fig. 2A  and B). The frequencies of nonswitched memory (IgDþCD27þ) B cells were the only assessed cell type that was significantly different in both PB and MB (Fig. 2C). The overall increased abundance of CD27 on CD19þ cells was only observed in MB (Fig. 2D).
Latent CMV infection is known to selectively affect the expansion of leukocyte subsets, especially regarding subsets of NK cells, T cells, and B cells (32)(33)(34)(35). Of all assessed features, a difference based on CMV status was only observed for CD27þ B cells in MB of patients with RPL (Fig. 2D). Patients with RPL who were CMV seropositive showed the highest frequencies of CD27 expressing B cells.

DISCUSSION
The adaptations of the maternal immune system during pregnancy occur not only systemically but also locally, where immune cells and soluble factors contribute to the contact of trophoblast cells and the maternal mucosa. Subtle changes in the immunologic profiles of both compartments may contribute to the etiology of RPL.
To create a better understanding of dysregulated immunity in RPL, we assessed in how far the joint assessment of leukocyte subsets may reveal differences in PB or MB of affected women. Using an ensemble machine learning strategy and recursive feature selection, we were able to identify 4 cell subsets of PB and 6 cell subsets and age for MB, which, when combined, robustly allowed for cohort classification for either PB or MB. Based on general definitions describing the discriminative power of a diagnostic test, both PB and MB had ''excellent'' distinctive value (36,37). Thus, RPL is associated with immune adaptations that can be traced back systemically and in the uterus.
Previous approaches of detecting immunologic alterations in RPL mostly considered either PB or MB and assessed a concise number of immune cell subsets or even focused on a single cell type. To allow for a more open assessment of the involved immunologic profile, we analyzed 63 subsets that we consider to allow for a broad screening of phenotypes (30). Classic univariate approaches often do not suffice when studying immunologic data (14). A large overlap between frequencies in RPL and control cohort measured in PB and MB hampers the individual discriminative value of a cell type. Furthermore, differences in cell types of low abundance and high variation are challenging to detect through the univariate analysis but could be revealed using the presented approach. As the differences in the area under the curve of the different individual classifiers show, algorithms of multivariate approaches also present with varying power to classify a sample to its cohort. The ensemble machine learning strategy overcomes the bias associated with choosing a single machine learning algorithm.
The power of combining classifiers, even in this lowsample-size setting, allowed to detect the discriminative value of low-frequency cell types in MB that have a possible regulatory phenotype. CD8þ T cells positive for Ki67 (indicating proliferation) were identified as the most classifying feature in MB samples. In healthy pregnancy, decidual CD8þ T cells are known to take longer than systemic cells to initiate proliferation upon stimulation, possibly through the interaction of their coinhibitory molecules with the decidual microenvironment (38). Also involved in local regulation (19,27,39,40), altered levels of Treg (overall cell population and HLA-DRþ), and CD24HiCD38Hi B cells contributed to robust RPL classification, fitting with the theory that pregnancy is negatively affected when the local mechanisms of tolerance fail (41,42). This focus on immune regulatory subsets was not observed in PB, except for NKbright cells, which contribute to cohort classification in both PB and uterine samples. Of note, age contributed as risk factor in uterine, but not systemic, immunity. The observed differences in classifying leukocyte subsets, depending on the sample source, affirm that systemic and local immunity of reproduction demand independent consideration.
Circulatory nonswitched memory B cells contributed strongly to cohort classification of PB, which is in line with previously shown higher levels of circulatory nonswitched memory B cells of patients with RPL (13,43). The presented data highlight that B cells deserve consideration in the context of RPL. Memory B cells are altered in several autoimmune diseases, such as systemic lupus erythematosus, systemic sclerosis, and antiphospholipid syndrome, known to be associated with poor reproductive outcome (44). B cellrelated pregnancy complications may also result from incorrect activation through soluble factors that regulate B cell activity, as serum from women experiencing spontaneous abortion failed to induce normal levels of IL-10 production by B cells (45). Of note, we observed higher levels of CD27þ B cells in CMV-seropositive patients with RPL. It has previously been shown that CMV affects the frequencies of CD27þ memory B cells in women but not in men (46). Thus, CMV status is important to include in similar approaches studying leukocytes because of its known effect of selectively expanding specific subsets (32,35).
Recurrent pregnancy loss is a multifactorial condition (47). A small sample size is a limitation of the current study because we cannot take into account subgroups of different disease etiologies. Nevertheless, we were able to show that investigating immunity by the multivariate approach in RPL is worthwhile. The applied flow cytometry-based profiling of MB holds the possibility to detect locally involved rare cell types (48)(49)(50)(51). Compared with the trending, highdimensional approaches using mass spectrometry or singlecell sequencing, flow cytometry is a high-throughput process and relatively cheap to perform and, thus, allows analysis of large cohorts. More women experiencing RPL have to be enrolled to account for the heterogeneity of this cohort. Furthermore, even though the discriminative value of PB and MB was similar, it is still valuable to expand on the assessment of both sample sources simultaneously. Not only could different immune cell subsets be altered that would further stratify the group of patients with RPL, but also these pathology profiles may be distinct to PB or MB. Combined with the inclusion of a validation cohort, this will ensure generality of results to enable translation toward diagnostic tests. This is of special interest for RPL, a condition that not only needs proper diagnosis but would benefit from repeated sampling to assess the success of intervention strategies. To both ends, classification needs to be robust and accurate and deserves further assessment. This pilot study highlights the need for a multivariate approach of PB and MB to better understand the disease and its underlying pathologies.
Taken together, patients with RPL present with a dysregulated immune environment, systemically and within the uterus. The cohort classification of possible diagnostic value cannot rely on individual immune cell frequencies but rather depends on a combination of immune cell subsets. The noninvasive source of MB, allowing to investigate important regulatory mechanisms, holds several opportunities for the assessment and monitoring of reproductive health.