Deep learning application of bone marrow aspiration cell discrimination in patients with myelodysplastic syndromes

In this study, a DL-based algorithm for the classification of dysplastic cells in bone marrow aspiration from patients with MDS was developed and validated. This algorithm showed favorable performance when applied to the classification of dysplastic and normal cells according to the three cell lineages, including erythropoiesis, granulopoiesis, and megakaryocytopoiesis. The overall AUC yield ranged from 0.945 to 0.996.

Research related to automatic cell detection and sorting in bone marrow aspiration is not easily applied to AI due to the limitations of the sample itself, including the variability of whole slides. Furthermore, the respective research of AI has not been actively carried out compared to other fields.12. Mori et al. analyzed a DL-based dysplasia assessment, specified for granule detection system decline, one of the dysplastic features of patients with MDS, and reported an AUC of 0.944 and an accuracy of 97.2%13. This study is the first cell discrimination analysis of MDS using bone marrow smear samples. Our study extended previous studies to include dyserythropoiesis, dysmegakaryopoiesis, and dysgranulopoiesis. In addition, various types of dysgranulopoiesis, such as nuclear hyposegmentation and unusually large sizes other than GD, were included, and an improved algorithm could be developed. Furthermore, similar to the study by Mori et al., GD showed the most favorable performance among the three dysplastic cell lineages. GD showed the highest values ​​for sensitivity, specificity, AUC, precision, PPV, NPV, and F1 scores. Dysgranulopoiesis accounts for the majority of nucleated cells in the bone marrow of most patients and is more specific in diagnosing MDS than dyserythropoiesis.fifteen. From this perspective, GD can act as a key factor in the development of an MDS diagnostic algorithm applied to DL.

In this study, the specificity, AUC, and precision were high, but the sensitivity, F1 score, and AP showed relatively low values ​​and did not reach expert reading ability. This study is a multiclassification and imbalance model; therefore, among performance evaluation tools, the F1 score, which is defined as the harmonic mean of the accuracy and recall values, may be suitable for interpretation. Referring to the F1 score, the performance of this study was between 0.643 and 0.938. Gradient-weighted class activation mapping (Grad-CAM) heat map generation technique was applied to infer various ratios of false positives or negativessixteen. The region of interest in a bone marrow nucleated cell in CNN was highlighted in this technique, and the significant region of the image for prediction could be brought into focus, which aided the interpretation of the algorithm. Through Grad-CAM, images correctly predicted as DE were found to be centered on the nucleus, which is the key to detection of dyserythropoiesis. Grad-CAM heat map showed that GD and MD, which were correctly predicted and also correctly detected in the nucleus and cytoplasm, and thus suitable for dysplastic features. In contrast, in the case of the DE incorrectly predicted as EN, the cytoplasm was focused instead of the nucleus. In the case of megaloblastic changes in the ED, it was difficult to read because it was predicted to be an NE. GD with decreased granules was sometimes read as EN or ED due to hypogranular cytoplasm, and when hypogranularity was severe, it was also read as other. In the case of GD, nuclear hyposegmentation showed difficulty in differentiating with erythroid cells compared to the pseudo-Pelger-Huet anomaly shape and/or decreased granules.

Although DL-based dysplastic cell detection has not yet shown performance that can replace hematologists, it is important as an adjunct tool for bone marrow-based diagnosis. Until now, most blood cell differential studies have been performed on peripheral blood or bone marrow biopsies.10,14,17,18. However, in recent years, an algorithm for differentiation of normal bone marrow cells has been developed and published, and research on dysplasia has been initiated, providing a basis for AI detection of MDS.9,11,19. Because the bone marrow aspirate slide contains many nucleated cells and the region to be read is large, it has several advantages when primary LD classification is introduced. For example, it is possible to reduce the turnaround time of the test reports and count more cells, which increases the accuracy of the calculation of the percentage of nucleated cells. Furthermore, instead of diagnosing dysplasia by subjective expert judgment, further standardization can be achieved through AI. Recently, an article related to full slide image detection was published, and it is expected to increase DL reading and access to digital images.twenty. In addition to identifying normal bone marrow cells in general, further studies are needed to address DL for each disease. MDS has characteristic cellular morphologies and properties4. It is necessary to build a database that includes cell images and genomic data according to various dysplasias and to develop a new approach to classify diseases and predict prognosis. In this study, the InceptionV3 architecture, a commonly used deep learning network, was used and may be extended for several future studies. Further follow-ups to this study are needed, such as the investigation of fully automated diagnostic approaches at the disease level for each patient and the application to pathomics of dysplastic cells.

The limitation of this study is that the detailed morphological manifestations of dysplasia in each cell lineage could not be trained separately. However, it is inferred that effective differentiation was possible by ensuring a sufficient number of normal cells. If dysplasia is classified according to its detailed features in the future, it is expected to achieve higher performance. Then the ratio of the number of cells in each class cannot be unified. Granulopoiesis had a relatively large number of cells compared to cells of other lineages; therefore, there is the possibility of better performance. Therefore, in the interpretation of this study, the lineage performance of each cell should be determined by considering numerical differences. Ultimately, only cell-based performance was tested in this study, and additional disease diagnostic performance needs to be developed for real-world clinical application. Through follow-up research, we intend to develop an algorithm that analyzes the percentage of dysplastic cells for each lineage of all nucleated cells in the bone marrow and is useful as a tool for diagnosing MDS.

In this study, we developed a classification algorithm that can distinguish between normal and dysplastic cells of three lineages in bone marrow aspiration smears from patients with MDS. The algorithm developed in this study could be used as an auxiliary tool for the diagnosis of MDS patients and is expected to contribute to shorten the time required for MDS bone marrow aspiration diagnosis and standardize visual reading.

Leave a Comment