Debris Flow Susceptibility and Its Reliability Based on Random Forest and GIS
-
摘要: 目前基于GIS的泥石流易发性(简称DFS)评价模型中,统计类型模型的因子须保证独立性,且权重受区间划分控制;线性机器学习难以处理非线性问题、而常用非线性模型调试效率低.鉴于随机森林(RF)能有效克服常用模型的诸多不足,且在DFS评价中的应用极少,首先展开基于RF的DFS评价,采用线性、RBF支持向量机、二次判别分析、RF等经贝叶斯优化的模型和26种泥石流影响因子;然后,分别以RF的相对权重排序和蒙特卡洛方法研究因子组合和建模样本变化下DFS评价的可靠性.结果表明:RF不易发和较易发区中有21个因子可指示泥石流孕育环境差异;RF的相对权重排序能有效确定易发模型的局部最优因子组合;随机样本划分导致的评价不确定性在中易发区最大,应通过提高建模样本比例和改善模型降低;RF的预测能力指标AUC为0.86、全局预测精度为0.79、F1分数为0.66、brier分数为0.14,以及它们的可靠度最优,可作为DFS定量评估的优先选择.Abstract: Nowadays models extensively used in GIS for debris-flow susceptibility (DFS) assessment remain obviously inadequate. In models based on classical statistical theory (e.g. information value, weight of evidence, and certainty factors), the independence between debris-flow conditioning factors is necessary, and the weight of these factors depends on the classification method. The linear machine learning may fail in nonlinear classification problems, whereas hyper-parameter tuning of usual nonlinear techniques is always difficult. Random forest (RF) is capable of resolving the most of problems of these usual models, but have hardly been applied in DFS assessment. This article aims to investigate the DFS assessment of RF and evaluate its reliability, using 4 models with the hyper-parameters tuning of Bayesian optimization, random forest (RF), linear support vector machine (LSVM), radial basis function-support vector machine (RBF-SVM), and quadratic discriminant analysis (QDA), and 26 conditioning factors. A modified five-fold cross-validation method is adopted to evaluate DFS assessment firstly, and then the rank of the relative weight of RF and Monte Carlo method are used respectively, to investigate the reliability of DFS assessment under the different combinations of debris-flow conditioning factors or the random sample split. Results demonstrate that 21 out of 26 debris-flow conditioning factors indicate the difference of the environments with different debris-flow rates. Relative weight rank of RF, can effectively determine the local optimal combination of factors for the 4 models. The uncertainty of susceptibility assessment resulting from the random sample split is most significant in the medium susceptibility zone (0.4~0.6), and can be reduced by increasing the proportion of the model building sample and improving the susceptibility model. The prediction performance of RF is:AUC=0.86, overall accuracy=0.79, F1 score=0.66 and brier score=0.14. And their reliability is optimal in all these 4 models. Therefore, RF can be a superior model for quantitative DFS assessment.
-
图 8 预测能力指标分布
a. LSVM(a=1.43, c=91, s=0.825, μ=[0.823 01,0.823 79]),RBF-SVM(a=6.08, c=90.62, s=0.82, μ=[0.830 53,0.830 92]),QDA(a=2.41, c=490, s=0.79, μ=[0.816 85,0.817 01]),RF(a=8.09, c=131, s=0.85, μ=[0.860 10,0.860 36]);b. LSVM(a=216.9, c=25, s=0.16, μ=[0.170 17,0.170 29]),RBF-SVM(a=28.1, c=28, s=0.15, μ=[0.155 27,0.155 41]),QDA(a=139.7, c=30, s=0.16, μ=[0.174 17,0.174 28],RF(a=18.7, c=38, s=0.14, μ=[0.140 63, 0.140 74]).各模型后括号内为对应分布的参数,a、c为分布形状参数,s代表比例参数,位置参数均为0,μ代表平均值95%的置信区间
Fig. 8. The distribution of indices of prediction performance
表 1 影响因子汇总
Table 1. The summary of impact factors
表 2 模型的混淆矩阵
Table 2. Confusion matrices of 4 models
LSVM(线性支持向量机) 预测值 非泥石流 泥石流 真实值 非泥石流 915 135 泥石流 202 318 RBF-SVM(RBF支持向量机) 预测值 非泥石流 泥石流 真实值 非泥石流 979 71 泥石流 285 235 QDA(二次判别分析) 预测值 非泥石流 泥石流 真实值 非泥石流 835 215 泥石流 162 358 RF(随机森林) 预测值 非泥石流 泥石流 真实值 非泥石流 931 119 泥石流 204 316 表 3 模型分类预测能力
Table 3. Classification performance of models
易发性模型 全局预测精度(%) 泥石流准确率(%) 泥石流查全率(%) F1分数(%) AUC(%) LSVM 78.54 70.20 61.15 65.36 81.4 RBF-SVM 77.32 76.80 45.19 56.90 82.8 QDA 75.99 62.48 68.85 65.51 81.7 RF 79.43 72.64 60.77 66.18 85.9 完全随机 50.00 33.00 50% 39.75 50.0 注:全局预测精度=正确分类单元个数/单元总个数,泥石流准确率=预测正确的泥石流单元数/总共预测为泥石流的单元数,泥石流查 全率=预测正确的泥石流单元数/实际泥石流单元总数,F1=2×泥石流准确率×泥石流查全率/(泥石流准确率+泥石流查全率). 表 4 各模型局部最优因子组合
Table 4. local optimal combination of conditioning factors in each model
模型 因子组合 AUC提升 Brier分数降低 LSVM 相对权重最大的1~11号因子 1.8% 0.7% RBF-SVM 相对权重最大的1~21号因子 1.0% 1.4% QDA 相对权重最大的1~12号因子 0.4% 7.0% RF 相对权重最大的1~12号因子 0.4% 1.7% 表 5 2 000次易发性评价指标均值
Table 5. The mean evaluation indices of 2 000 susceptibility assessments
易发性模型 全局精度(%) 泥石流准确率(%) 泥石流查全率(%) F1分数(%) AUC(%) Brier分数 LSVM 78.10 69.63 60.20 64.57 82.3 0.176 RBF-SVM 76.80 75.17 44.77 56.09 83.1 0.155 QDA 76.11 62.65 69.00 65.67 81.7 0.174 RF 79.30 72.86 59.76 65.66 86.0 0.140 注:样本数量2 000下,各指标均值的95%置信区间大小已精确到小数点后4位,有很高的确定性,足够模型使用和相互之间的对比,故该表中不再以置信区间形式给出,而直接给出均值. -
[1] Agterberg, F. P., Cheng, Q. M., 2002. Conditional Independence Test for Weights-of-Evidence Modeling. Natural Resources Research, 11(4):249-255. https://doi.org/10.1023/A:1021193827501 [2] Alin, A., 2010. Multicollinearity. Wiley Interdisciplinary Reviews:Computational Statistics, 2(3):370-374. https://doi.org/10.1002/wics.84 [3] Birolini, A., 2017. Reliability Engineering:Theory and Practice. Springer, Heidelberg. [4] Breiman, L., 2001. Random Forests. Machine learning, 45(1):5-32. https://doi.org/10.1023/A:1010933404324 [5] CDATA[Brier, G. W., 1950. Verification of Forecasts Expressed in Terms of Probability. Monthly Weather Review, 78(1):1-3. https://doi.org/10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2 doi: 10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2 [6] Chen, C. Y., Yu, F. C., 2011. Morphometric Analysis of Debris Flows and Their Source Areas Using GIS. Geomorphology, 129(3-4):387-397. https://doi.org/10.1016/j.geomorph.2011.03.002 [7] Chen, J., Li, Y., Zhou, W., et al., 2017. Debris-Flow Susceptibility Assessment Model and Its Application in Semiarid Mountainous Areas of the Southeastern Tibetan Plateau. Natural Hazards Review, 18(2):05016005.https://doi.org/10.1061/(asce)nh.1527-6996.0000229 doi: 10.1061/(ASCE)NH.1527-6996.0000229 [8] Chevalier, G. G., Medina, V., Hürlimann, M., et al., 2013.Debris-Flow Susceptibility Analysis Using Fluvio-Morphological Parameters and Data Mining:Application to the Central-Eastern Pyrenees. Natural Hazards, 67(2):213-238. https://doi.org/10.1007/s11069-013-0568-3 [9] Cortes, C., Vapnik, V., 1995. Support-Vector Networks.Machine Learning, 20(3):273-297. https://doi.org/10.1007/bf00994018 http://d.old.wanfangdata.com.cn/Periodical/hwyhmb200803006 [10] Devkota, K. C., Regmi, A. D., Pourghasemi, H. R., et al., 2013. Landslide Susceptibility Mapping Using Certainty Factor, Index of Entropy and Logistic Regression Models in GIS and Their Comparison at Mugling-Narayanghat Road Section in Nepal Himalaya. Natural Hazards, 65(1):135-165. https://doi.org/10.1007/s11069-012-0347-6 [11] Eker, A. M., Dikmen, M., Cambazoğlu, S., et al., 2015.Evaluation and Comparison of Landslide Susceptibility Mapping Methods:A Case Study for the Ulus District, Bartın, Northern Turkey. International Journal of Geographical Information Science, 29(1):132-158. https://doi.org/10.1080/13658816.2014.953164 [12] Fernández-Delgado, M., Cernadas, E., Barro, S., et al., 2014. Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? The Journal of Machine Learning Research, 15(1):3133-3181. http://connection.ebscohost.com/c/articles/99397983/do-we-need-hundreds-classifiers-solve-real-world-classification-problems [13] Frattini, P., Crosta, G., Carrara, A., 2010. Techniques for Evaluating the Performance of Landslide Susceptibility Models. Engineering Geology, 111(1-4):62-72. https://doi.org/10.1016/j.enggeo.2009.12.004 [14] Guzzetti, F., Carrara, A., Cardinali, M., et al., 1999. Landslide Hazard Evaluation:A Review of Current Techniques and Their Application in a Multi-Scale Study, Central Italy. Geomorphology, 31(1-4):181-216.https://doi.org/10.1016/s0169-555x(99)00078-1 doi: 10.1016/S0169-555X(99)00078-1 [15] Henrique, G.M., Ronald, L.B., Robert, R.W., et al., 2013.Effect of Topographic Characteristics on Compound Topographic Index for Identification of Gully Channel Initiation Locations. Transactions of the ASABE, 56(2):523-537. https://doi.org/10.13031/2013.42673 [16] Hu, K.H., Cui, P., Han, Y.S., et al., 2012.Susceptibility Mapping of Landslides and Debris Flows in 2008 Wenchuan Earthquake by Using Cluster Analysis and Maximum Likelihood Classification Methods. Science of Soil and Water Conservation, 10(1):12-18 (in Chinese with English abstract). http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=zgstbckx201201003 [17] Huang, F.M., Wang, Y., Dong, Z.L., et al., 2019. Regional Landslide Susceptibility Mapping Based on Grey Relational Degree Model. Earth Science, 44(2):664-676(in Chinese with English abstract). http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=dqkx201902024 [18] Huang, F. M., Yin, K. L., Yang, B. B., et al., 2018. Step-Like Displacement Prediction of Landslide Based on Time Series Decomposition and Multivariate Chaotic Model. Earth Science, 43(3):887-898 (in Chinese with English abstract). http://d.old.wanfangdata.com.cn/Periodical/dqkx201803017 [19] Huang, R.Q., Qi, S.W., 2017. Engineering Geology:Review and Prospect of Past Ten Years in China. Journal of Engineering Geology, 25(2):257-276 (in Chinese with English abstract). http://www.en.cnki.com.cn/Article_en/CJFDTotal-GCDZ201702001.htm [20] Hungr, O., Evans, S. G., Bovis, M. J., et al., 2001. A Review of the Classification of Landslides of the Flow Type. Environmental & Engineering Geoscience, 7(3):221-238. https://doi.org/10.2113/gseegeosci.7.3.221 [21] Hungr, O., Leroueil, S., Picarelli, L., 2014. The Varnes Classification of Landslide Types, an Update. Landslides, 11(2):167-194. https://doi.org/10.1007/s10346-013-0436-y [22] Kritikos, T., Davies, T., 2015. Assessment of Rainfall-Generated Shallow Landslide/Debris-Flow Susceptibility and Runout Using a GIS-Based Approach:Application to Western Southern Alps of New Zealand. Landslides, 12(6):1051-1075. https://doi.org/10.1007/s10346-014-0533-6 [23] Kursa, M. B., Rudnicki, W. R., 2010. Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11):1-13. https://doi.org/10.18637/jss.v036.i11 http://d.old.wanfangdata.com.cn/OAPaper/oai_doaj-articles_5b68996e67abc5d5ed2df21ba5a5de9d [24] Li, F., Mei, H.B., Wang, W.S., et al., 2017. Rainfall-Induced Meteorological Early Warning of Geo-Hazards Model:Application to the Monitoring Demonstration Area in Honghe Prefecture, Yunnan Province.Earth Science, 42(9):1637-1646 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTOTAL-DQKX201709016.htm [25] Li, M. M., Wu, B. F., Yan, C. Z., et al., 2004. Estimation of Vegetation Fraction in the Upper Basin of Miyun Reservoir by Remote Sensing. Resources Science, 26(4):153-159 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTOTAL-ZRZY200404022.htm [26] Niculescu-Mizil, A., Caruana, R., 2005. Predicting Good Probabilities with Supervised Learning. In: Stefan, W., ed., Proceedings of the 22nd International Conference on Machine Learning. Association for Computing Machinery, Bonn, 625-632. [27] Oh, H. J., Lee, S., 2017. Shallow Landslide Susceptibility Modeling Using the Data Mining Models Artificial Neural Network and Boosted Tree. Applied Sciences, 7(10):1-14. https://doi.org/10.3390/app7101000 [28] Pedregosa, F., Varoquaux, G., Gramfort, A., et al., 2011.Scikit-Learn:Machine Learning in Python. Journal of Machine Learning Research, 12(10):2825-2830. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1309.0238 [29] Pradhan, B., 2013. A Comparative Study on the Predictive Ability of the Decision Tree, Support Vector Machine and Neuro-Fuzzy Models in Landslide Susceptibility Mapping Using GIS. Computers & Geosciences, 51:350-365. https://doi.org/10.1016/j.cageo.2012.08.023 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=c652c43324bfb4b73bd38d99709fd144 [30] Reichenbach, P., Rossi, M., Malamud, B. D., et al., 2018.A Review of Statistically-Based Landslide Susceptibility Models. Earth-Science Reviews, 180:60-91. https://doi.org/10.1016/j.earscirev.2018.03.001 [31] Rossi, M., Guzzetti, F., Reichenbach, P., et al., 2010. Optimal Landslide Susceptibility Zonation Based on Multiple Forecasts. Geomorphology, 114(3):129-142. https://doi.org/10.1016/j.geomorph.2009.06.020 [32] Shortliffe, E. H., Buchanan, B. G., 1975. A Model of Inexact Reasoning in Medicine. Mathematical Biosciences, 23(3-4):351-379. https://doi.org/10.1016/0025-5564(75)90047-4 [33] Sodnik, J., Mikoš, M., 2006. Estimation of Magnitudes of Debris Flows in Selected Torrential Watersheds in Slovenia. Acta Geographica Slovenica, 46(1):93-123.https://doi.org/10.3986/ags46104 doi: 10.3986/AGS46104 [34] Tang, G. A., Yang, X., 2012. Experimental Course on Spatial Analysis of Geographic Information System Arcgis.Science Press, Beijing(in Chinese). [35] Yue, X.L., Huang, M., Xu, Q.Y., et al., 2015. The Susceptibility Assessment of Debris Flow in Karst Region of Guizhou Province. Journal of Geo-Information Science, 17(11):1395-1403 (in Chinese with English abstract). http://d.old.wanfangdata.com.cn/Periodical/dqxxkx201511015 [36] Zhang, S.H., Wu, G., Zhang, Q., et al., 2018. Debris-Flow Susceptibility Assessment Using the Characteristic Factors of a Catchment. Hydrogeology & Engineering Geology, 45(2):142-149 (in Chinese with English abstract). http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=swdzgcdz201802022 [37] Zhao, H., Song, E. X., 2011. Improved Information Value Model and Its Application in the Spatial Prediction of Landslides. Journal of Civil, Architectural & Environmental Engineering, 33(3):38-44, 51 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTOTAL-JIAN201103008.htm [38] Zio, E., 2013. The Monte Carlo Simulation Method for System Reliability and Risk Analysis. Springer, London. doi: 10.1007%2F978-1-4471-4588-2 [39] 胡凯衡, 崔鹏, 韩用顺, 等, 2012.基于聚类和最大似然法的汶川灾区泥石流滑坡易发性评价.中国水土保持科学, 10(1):12-18. doi: 10.3969/j.issn.1672-3007.2012.01.003 [40] 黄发明, 汪洋, 董志良, 等, 2019.基于灰色关联度模型的区域滑坡敏感性评价.地球科学, 44(2):664-676. http://d.old.wanfangdata.com.cn/Periodical/dqkx201902024 [41] 黄发明, 殷坤龙, 杨背背, 等, 2018.基于时间序列分解和多变量混沌模型的滑坡阶跃式位移预测.地球科学, 43(3):887-898. http://d.old.wanfangdata.com.cn/Periodical/dqkx201803017 [42] 黄润秋, 祁生文, 2017.工程地质:十年回顾与展望.工程地质学报, 25(2):257-276. http://d.old.wanfangdata.com.cn/Periodical/gcdzxb201501001 [43] 李芳, 梅红波, 王伟森, 等, 2017.降雨诱发的地质灾害气象风险预警模型:以云南省红河州监测示范区为例.地球科学, 42(9):1637-1646. http://d.old.wanfangdata.com.cn/Periodical/dqkx201709016 [44] 李苗苗, 吴炳方, 颜长珍, 等, 2004.密云水库上游植被覆盖度的遥感估算.资源科学, 26(4):153-159. doi: 10.3321/j.issn:1007-7588.2004.04.022 [45] 汤国安, 杨昕, 2012.Arcgis地理信息系统空间分析实验教程.北京:科学出版社. [46] 岳溪柳, 黄玫, 徐庆勇, 等, 2015.贵州省喀斯特地区泥石流灾害易发性评价.地球信息科学学报, 17(11):1395-1403. http://d.old.wanfangdata.com.cn/Periodical/dqxxkx201511015 [47] 张书豪, 吴光, 张乔, 等, 2018.基于子流域特征的泥石流易发性评价.水文地质工程地质, 45(2):142-149. http://d.old.wanfangdata.com.cn/Periodical/swdzgcdz201802022 [48] 赵衡, 宋二祥, 2011.滑坡空间预测中信息量模型的改进及应用.土木建筑与环境工程, 33(3):38-44, 51. http://d.old.wanfangdata.com.cn/Periodical/cqjzdxxb201103007