01955nas a2200181 4500008004100000245008100041210006900122260000900191300000900200490000700209520140300216653002301619100001801642700002101660700001901681700001901700856005401719 2020 eng d00aNoise Accumulation in High Dimensional Classification and Total Signal Index0 aNoise Accumulation in High Dimensional Classification and Total  c2020 a1-230 v213 aGreat attention has been paid to Big Data in recent years. Such data hold promise for scientific discoveries but also pose challenges to analyses. One potential challenge is noise accumulation. In this paper, we explore noise accumulation in high dimensional two-group classification. First, we revisit a previous assessment of noise accumulation with principal component analyses, which yields a different threshold for discriminative ability than originally identified. Then we extend our scope to its impact on classifiers developed with three common machine learning approaches—random forest, support vector machine, and boosted classification trees. We simulate four scenarios with differing amounts of signal strength to evaluate each method. After determining noise accumulation may affect the performance of these classifiers, we assess factors that impact it. We
conduct simulations by varying sample size, signal strength, signal strength proportional to the number predictors, and signal magnitude with random forest classifiers. These simulations suggest that noise accumulation affects the discriminative ability of high-dimensional classifiers developed using common machine learning methods, which can be modified by sample size, signal strength, and signal magnitude. We developed the measure total signal index (TSI) to track the trends of total signal and noise accumulation.10aBusiness Analytics1 aElman, Miriam1 aMinnier, Jessica1 aChang, Xiaohui1 aChoi, Dongseok uhttp://jmlr.org/papers/volume21/19-117/19-117.pdf