A machine learning-based classification approach on Parkinson’s disease diffusion tensor imaging datasets

Introduction The presence of motor signs and symptoms in Parkinson’s disease (PD) is the result of a long-lasting prodromal phase with an advancing neurodegenerative process. The identification of PD patients in an early phase is, however, crucial for developing disease-modifying drugs. The objective of our study is to investigate whether Diffusion Tensor Imaging (DTI) of the Substantia nigra (SN) analyzed by machine learning algorithms (ML) can be used to identify PD patients. Methods Our study proposes the use of computer-aided algorithms and a highly reproducible approach (in contrast to manually SN segmentation) to increase the reliability and accuracy of DTI metrics used for classification. Results The results of our study do not confirm the feasibility of the DTI approach, neither on a whole-brain level, ROI-labelled analyses, nor when focusing on the SN only. Conclusions Our study did not provide any evidence to support the hypothesis that DTI-based analysis, in particular of the SN, could be used to identify PD patients correctly.


Background
Diffusion tensor imaging (DTI) has been proposed for analyzing microstructural integrity not only of white but also grey matter. However, the use of DTI to observe, e. g., subcortical grey matter changes is currently under debate [15]. Whether microstructural alterations of the whole brain, regions of interests (ROI)-labeled grey matter, or the substantia nigra (SN) can be detected applying diffusion metrics in Parkinson's disease patients (PD) is still unclear. The significance of several previous DTI studies in PD is limited due to small sample sizes and by the fact that specific regions of interests were delineated manually for the extraction of diffusion metrics. Besides, studies that were able to demonstrate significant group differences have also shown a relevant overlap of diffusion metrics between PD patients and healthy controls, which undermines the potential diagnostic use. Machine learning-based (ML) models might help to detect subtle alterations of diffusion metrics, by their multivariate nature and by the integration of different imaging modalities, and to improve their diagnostic use subsequently. The aforementioned practice also hindered the translation into clinical practice [19]. Our study hypothesizes that ML algorithms and the application of a suitable sub-cortical atlas for the elderly population can be used to distinguish between PD patients and age-and gendermatched healthy controls in a standardized and therefore potentially more sensitive manner [5]. Computing algorithms like binary support vector machines (bSVM) or multiple-kernel learning (MKL) provide suitable and promising tools to address classification problems based on neuroimaging data [18]. Advancements in the multivariate interpretation of neuroimaging data have already been proven useful in a plethora of neuropsychiatric [16] and neurodegenerative diseases [11,12]. Besides, the employment of machine-learning algorithms to Parkinson's disease datasets has offered unique advancements in interpreting distinct neuroimaging modalities [3,4,20,23]. MKL also yields the opportunity to concatenate different imaging modalities. This is of particular interest as distinct diffusion metrics are meant to resemble different histopathological hallmarks of neurodegeneration [22].
Methods et al. [1]. Local diffusion homogeneity (LDH) is another diffusion metric that is specifically relevant to assess tissue homogeneity based on neighboring voxels [9]. We computed LDH for 6, 18, and 26 neighboring voxels using Spearman's Rank Correlation coefficient (06LDHs, 18LDHs, and 26LDHs) and Kendall's coefficient concordance (06LDHk, 18LDHk, and 26LDHk) [9]. Voxelwise whole-brain analysis was performed using the FM-RIB58_FA template. We performed ROI-labeled analyses based on the well-established AAL atlas [21]. To further increase the signal-to-noise ratio, we additionally performed classification after masking of the SN using the ATAG atlas for the elderly population [10]. The datasets were classified through bSVMs (for single modalities) as well as MKL (for concatenated modalities). Ten-fold cross-validation (CV) and nested (leave one subject out) hyperparameter optimization as implemented in the PRoNTo-Toolbox (v2.1) [18]. The determination of relevant bSVM and MKL parameters (such as the applied L1 regularization method or the nested hyperparameter optimization) is following standard practice and is extensively described in the publications of Schrouff et al. [17,18]. Age, gender, and total intracranial volume were used as covariates. Balanced Accuracy (BA) and area under the curve of the receiver-operating characteristic curve (ROC-AUC) were calculated to assess classification performance and were compared to random permutation testing (against 10.000 permutations).

Results
The application of the bSVM on the various types of diffusion metrics revealed that there are no significant differences concerning the BA or the ROC-AUC for voxel-wise whole-brain or AAL-based ROI-labeled analyses (data not shown here). As most studies suggest, diffusion metrics are most likely altered in the SN of PD patients, making the SN the region of highest interest to increase the signal-to-noise ratio for classification [19]. Therefore, further analyses focused on the diffusion metrics of the masked SN and will be reported in the following (see Fig. 1 06LDHs + 18LDHs + 26LDHs (BA: 56.15%, ROC-AUC: 0.60); 06LDHk + 18LDHk + 26LDHk (BA: 58.12%, ROC-AUC: 0.52). An overview on provided diagnostic performances in displayed in the Table 1. The comparison to random permutation testing showed that the classifications, as mentioned above, did not outperform pure chance. Additionally, calculated weight maps are indicating a random weighting distribution of voxels within the SN used for the respective classifications (see Fig. 2), which is in contrast to previously reported changes of the dorsolateral portion of the SN (i. e., the nigrosome-1) [13].

Discussion
In this study, we demonstrated a standardized and systematic approach to potentially attain the individual discrimination of PD patients from healthy controls using DTI datasets. This approach comprised the preprocessing of the data, the automatized selection of appropriate features, and the subsequent classification. Atkinson-Clement, Pinto, Eusebio, and Coulon [2] already stated that "[…] they did not observe a PD induced reduction of nigral FA" but also that "this observation is in contrast with some recent publications claiming very high diagnostic accuracy, but [are] well in line with other reports showing small or no PD induced  nigral FA decrease". A meta-analysis also did "not support nigral DTI metrics as a useful diagnostic marker of PD" [19]. Our results are supporting the aforementioned lack of evidence and should put discussions about the diagnostic use of diffusion metrics in PD patients to rest.
The negative results of our study most likely reflect the lacking suitability of diffusion metrics to investigate SNrelated microstructural alterations in PD. The interpretation of our findings within the scope of differing DTI acquisition schemes and MRI scanner hardware is challenging. However, a multicenter validation study by the authors of Fox et al. [7] stated high intersiteconcordance for applied DTI metrics on different scanner hardware (3 T magnetic field strength). MLalgorithms are a more standardizable and sensitive method to increase diagnostic accuracy and to disentangle the overlap of diffusion metrics other groups reported, which were only using voxel-wise massunivariate or manually extracted diffusion metrics for subsequent analysis. The multivariate, compared to mass-univariate, approach and the additional concatenation of modalities should enhance the discriminatory, and therefore, diagnostic accuracy substantially. The lack of significant findings despite a larger sample size and a more sensitive and sophisticated approach in this study are further supporting the view that traditional diffusion metrics are indeed missing any diagnostic use. Whether DTI can be used to map individual disease progression remains, to this point, elusive. Further methodological improvements of diffusion-based imaging might improve diagnostic accuracy and might, therefore, cause a reconsideration of our current conclusion. However, the current MRI acquisition and analysis paradigms of DTI measures are not of any use for investigating grey matter alterations in PD. Further studies without substantial methodological improvements will most likely not result in potentially translatable advancements in improving diagnostic accuracy or patient care. Recent research studies which revealed that the use of free-water corrected diffusion maps for the analysis of tissue alterations might provide the opportunity for fostering the diagnostic accuracy based on this dataset [14]. However, ML analyses of neuroimaging data is a fruitful approach in supporting clinical decision making and will be more frequently applied in the future [8]. The objective of our study was to investigate the role of ML-based algorithms on diffusion metrics to identify PD patients correctly.
Our study did not provide any evidence to support the hypothesis that DTI-based analysis, in particular of the SN, could be used to resolve the issue of correctly classifying study participants independent of the phenotype. An advantage of our methodology is that by calculating weighting maps, we can additionally validate our findings: Previous literature stated that the dorsolateral parts of the SN are the ones that are particularly affected at the beginning of the disease [19]. Weighting maps should indicate the higher relevance of these specific areas for classification performance (which is in contrast to our findings, see Fig. 2). Here, this advantage is of even higher importance as further partitioning of the SN appears, within the scope of the already small region and the present image resolution, not to be feasible.

Conclusion
Our findings are well in line with previous publications using conventional analyses. Further studies without substantial methodological improvements (e. g., utilizing more complex diffusion models) will most likely not result in potentially translatable advancements in improving diagnostic accuracy or patient care.