Diffuse Reflectance Infrared Spectroscopy Estimates for Soil Properties Using Multiple Partitions: Effects of the Range of Contents, Sample Size, and Algorithms

Abstract

Abstract The RMSE of validation (RMSEV) and ratio of the interquartile range to RMSEV (RPIQV) are key quality parameters in diffuse reflectance infrared (IR) spectroscopy studies, but the effects of different factors on these parameters are often not sufficiently considered. Our objectives were to reveal the effects of range of contents, sample size, data pretreatment, wavenumber region selection, and algorithms on the evaluation of IR spectra in the wavenumber range from 1,000 to 7,000 cm-1 (mid- and long-wave near IR) estimations. Contents of soil organic C (SOC), N, clay, and sand and pH values were determined for surface soils of an arable field in India, and IR spectra were recorded for four samples consisting of 71dbend 263 soils. For each of the four samples, five random partitions into calibration and validation datasets were carried out, and partial least squares regression (PLSR) or support vector machine regression was performed. A plot of the RMSEV values against the interquartile ranges of measured values for the validation samples (IQRV) indicated that the IQRV was a key parameter for all soil properties: a sufficiently high IQRVdbend which is affected by sample size and random partitioningdbend resulted in generally good estimation accuracies (RPIQV = 2.70). Optimized data pretreatment and wavenumber region selection improved estimation accuracy for SOC and pH. Support vector machine regression was superior to PLSR for the estimation of SOC, clay, and sand, but worse for pH. Overall, this study indicates that multiple partitioning of the data is essential in IR studies and suggests that RPIQV and RMSEV need to be interpreted in the context of the respective IQRV values.

Publication
Soil Science Society of America Journal

Related