Using Variable Selection and Wavelets to Exploit the Full Potential of Visible-near Infrared Spectra for Predicting Soil Properties

Abstract

In soil spectroscopy a series of strategies exists to optimise multivariate calibrations. We explore this issue with a set of topsoil samples for which we estimated soil organic carbon (OC) and total nitrogen (N) from visible-near infrared (vis-NIR) spectra (350-2500 nm). In total, 172 samples were collected to cover the soil heterogeneity in the study area located in western Rhineland-Palatinate, Germany. There, soils with varying properties developed from very diverse parent materials, e.g., ranging from very acidic sandstone to dolomitic marl. We defined four sample sets each of a different size and heterogeneity. Each set was subdivided into a calibration and a validation set. The first strategy that we tested to improve prediction accuracies was spectral variable selection using competitive adaptive reweighted sampling (CARS) and iteratively retaining informative variables (IRIV), both in combination with partial least squares regression (PLSR). In addition, continuous wavelet transformation (CWT) with the Mexican Hat wavelet was applied to decompose the measured spectra into multiple scale components (dyadic scales 2(1)-2(5)) and thus to represent the high and low frequency features contained in the spectra. CARS was then applied to select wavelet coefficients from the different scales and to introduce them in the PLSR approach (CWT-CARS-PLSR). Regarding prediction power, CWT-CARS-PLSR outperformed the other approaches. For the smallest data set with 30 validation samples, prediction accuracy for OC increased from approximately quantitative with full spectrum-PLSR (r(2) = 0.81, residual prediction deviation (RPD) = 2.27) to excellent when using wavelet decomposition and CARS-PLSR (r(2) = 0.93, RPD = 3.60). For N, predictions improved from unsuccessful (r(2) = 0.63, RPD = 1.36) to approximately quantitative (r(2) = 0.84, RPD = 2.03). In case of OC, predictions were worst for the largest dataset with 57 validation samples: CWT-CARS-PLSR achieved approximately quantitative predictions (r(2) = 0.82, RPD = 2.31), whereas full spectrum-PLSR provided estimates that allowed only separating between high and low values (r(2) = 0.72, RPD = 1.88). Accuracy of N estimation for this dataset using CWT-CARS-PLSR was also approximately quantitative. Concerning the tested spectral variable selection techniques, both methods provided similar results in the prediction. The application of IRIV was limited due to long processing times.

Publication
JOURNAL OF NEAR INFRARED SPECTROSCOPY

Related