Institute of Technology, Faculty of Science and Technology, University of Tartu, 50090 Tartu, ESTONIA.
The key findings and conclusions of the present thesis may be summarized as follows:
• Amongst the predictor variables in Db prediction models simple regression model overestimated the importance of SOC (%). The linear mixed model approach revealed that after SOC (%) most of the systematic variation in the data can be attributed to sampling depth and Wc. However, both of these predictors are rarely used in Db modelling studies and therefore the findings in the present thesis suggest that the inclusion of these parameters could significantly improve the predictions.
• In SOC (%) prediction models soil type had the controlling effect on predictions in linear regression, random forests and linear mixed model. Other tested variables (Cf content, A-horizon thickness) showed a marginal but significant impact on SOC (%) prediction.
• Large part of the random variation associated with Db resulted from residual error related to the individual measurements within each survey plot indicating to the sampling methodology and the lack of information on land use history. However, in SOC (%) prediction most of the variation was found between sites showing that SOC (%) is more variable between sites.
• Comparison of different methods showed that the linear mixed model for SOC (%) and Db resulted in the highest prediction accuracy (RMSE 0.22% and RMSE 0.09 g cm–3). Random forests represented the black-box models and achieved two times lower prediction accuracy – RMSE 0.42%. However, the latter method amongst with the linear regression methods under- and overestimated soil properties with higher or lower values. The linear mixed model managed to incorporate different types of variations exposed by the monitoring design. Nevertheless, linear mixed model-based predictions are scarce in the literature; however, in some cases the data sampling scheme is suitable for using this method and then it is strongly advisable to use it as the predictions can be otherwise misleading.
• The linear mixed model approach enables to use kriging for utilizing the spatial information available in the data by coordinates of the sites. Mixed model-based kriging for SOC (%) had a very modest gain in prediction accuracy. This can be explained by the validation data not designed for the purpose of prediction and moreover the sampling scheme was different from the training data.
• The predicted Db and SOC (%) using median approach, linear regression and linear mixed model were used to estimate the SOC stock in mineral soil of arable land in national SMN database. The best prediction accuracy was achieved by the mixed model-predicted values resulting three times higher RMSE of 7 t C ha–1 compared to other methods. Thus, the linear mixed model-based SOC stock estimation was implemented into Estonian Soil Map data to observe spatial pattern of SOC stock in mineral soil of arable land in Tartu County. Average estimated SOC stock of Tartu County is 54.8 t C ha–1 and total topsoil SOC stock of mineral arable soils 1.8 Mt.
• The methodology outlined in this thesis for improving prediction models is general and applicable to situations with more candidate models in similar data structures. The methodology can be successfully used to renew existing legacy soil data and use it efficiently addressing current scientific and practical soil use questions.