随着信息技术的发展, 采集、存储和管理数据的手段日益完善, 数据挖掘学科应运而生。针对作物蒸散量()计算中参数多、噪声复杂和非线性的特点,本文研究一种数据挖掘方法——随机森林,在数据挖掘建模过程中引入知识流的概念,通过知识流模型的分层,实现从不同角度、不同层次观察模型知识流的分布和流动情况,抓住建模关键环节,为更加准确、高效地建立模型提供支持。将其应用于作物蒸散量计算中,以西安的气象数据为输入参数,以FAO56 Penman-Monteith公式计算的结果作为预测对比值,进行了建模与验证。结果表明,采用知识流建立的数据挖掘模型预测精度较高、稳定性好、泛化能力强,能够有效预测作物蒸散量。相比其他模型效率高,性能更优,尤其适用于大样本的作物蒸散量预测分析中,对于确定作物需水量有一定的参考价值。
Abstract
The discipline of data mining emerges with the development of information technology and maturation of methods of data collection,storage and management. For crop evapotranspiration (ET0 )parameter calculation has more complex and nonlinear noise characteristics,a data mining method named random forest is researched. The data mining modeling process introduced the concept called knowledge flow,through hierarchical knowledge flow model,checking knowledge flow distribution and flow situation and seizing the key link by different aspects and levels were carried out. It provided more accurate and efficient support for modeling. It will be applied to the calculation of crop evapotranspiration and verified by building a model which takes Xian weather data as input parameters and contrast it with the results from FAO56 Penman-Monteith formula. The result shows that the data mining model based on knowledge flow is of high accuracy,good stability and strong generalization ability. It can effectively predict the crop evapotranspiration. Compared to other models,it has higher efficiency and better performance,especially applicable to the prediction of large samples' crop evapotranspiration. It also has certain reference value for judging crop water requirements.
关键词
作物蒸散量 /
数据挖掘 /
随机森林 /
知识流
{{custom_keyword}} /
Key words
crop evapotranspiration /
data mining /
random forest /
knowledge flow
{{custom_keyword}} /
基金
农业领域国家科技计划("863"计 划)资助项目(2011AA100509-01);
2014年陕西省科学技术研究发展计划项目(2014K01-33-02);
新疆自治区高技术研究发展项目(201413102)。
{{custom_fund}}
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1]侯志强,杨培岭,苏艳平,任树梅. 基于最小二乘支持向量机的ET0模拟计算[J].水利学报,2011,42(6):743-749
[2]张育斌,魏正英,马胜利,李飚,唐大辉. 极端天气下作物参照蒸散量计算方法研究[J],中国农村水利水电,2014(12):64-71
[3]Jiawei Han,Micheline Kamber,Jian Pei著,范明,孟小峰译.数据挖掘概念与技术.2012,1-22
[4]Breiman L.Random Forests[J].Machine Learning.2001,45(1):5-32
[5]李新海.随机森林模型在分类与回归分析中的应用[J].应用昆虫学报,2013,50(4):1190-1197
[6]甄亿位,郝敏,陆宝宏,左建,刘欢. 基于随机森林的中长期降水量预测模型研究[J].水电能源科学,2015,33(6):6-10
[7]王小川,史峰,郁磊,李洋.MATLAB神经网络43个案例分析.2013,255-264
[ 8] Hai Zhuge. A knowledge flow model for peer- to-peer team knowledge sharing and management[J] . Expert Systems with Applications, 2002, 23: 23- 30
[9]Max H.Boisot.Is your firm a creative destroyer? Competitive learning and know ledge flows in the technological strategies of firms[J] . Research Policy, 1995, 24: 489 - 506
[10]李凤云. 基于新型企业观的知识流管理[J] . 中国质量, 2004, 2: 6- 8
[11] 周密,承文,韩立岩,张海峰.知识流模型及其在航天企业中的应用[J].中国管理科学, 2015, 13(5): 79-86
[12]Andy L,Matthew Wiener.Classification and Regression by random forest[J].R News,2002,2(3):18-22
[13]Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot. Variable selection using Random Forests. Pattern Recognition Letters, Elsevier. 2010, 31 (14):2225-2236.
[14]明均仁,肖凯. 基于R语言的面向需水预测的随机森林方法[J].统计与决策,2012,357(9):81-83
[15]李建军,许燕等.基于 BP 神经网络预测和模糊控制的灌溉控制器设计[J].机械设计与研究,2015(5):150-153