Volume 46 Issue 8
Aug.  2021
Turn off MathJax
Article Contents
Chu Deping, Wan Bo, Li Hong, Fang Fang, Wang Run, 2021. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science, 46(8): 3039-3048. doi: 10.3799/dqkx.2020.309
Citation: Chu Deping, Wan Bo, Li Hong, Fang Fang, Wang Run, 2021. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science, 46(8): 3039-3048. doi: 10.3799/dqkx.2020.309

Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model

doi: 10.3799/dqkx.2020.309
  • Received Date: 2020-09-17
    Available Online: 2021-09-14
  • Publish Date: 2021-08-15
  • Geological entity is the key and core information in geological texts, and its accurate recognition is an important prerequisite for geological information extraction and mining. The ELMO-CNN-BiLSTM-CRF model is designed in this paper. Based on the pre-trained word vector, the deep BiLSTM-CRF neural network model is constructed. By adding dynamic features of words and character-level features of words, it makes up for the lack of specificity of word vectors, improves the recognition level of complex multi-word meanings in geological text and the ability to extract local features of geological entities. Taking the geological survey report of Xiongcun copper mine in Xietongmen County of Xizang Autonomous Region as an example, the performance of the model is evaluated. The accuracy rate, recall rate and F1 value of the model are 95.15%, 95.26% and 95.21% respectively. Experiments show that compared with BiLSTM-CRF and CNN-BiLSTM-CRF models, this model is more effective in small-scale corpus geological entity recognition, and can effectively identify long geological entity words and geological polysemants.

     

  • loading
  • Baumann, P., Mazzetti, P., Ungar, J., et al., 2016. Big Data Analytics for Earth Sciences: The Earth Server Approach. International Journal of Digital Earth, 9(1): 3-29. https://doi.org/10.1080/17538947.2014.1003106
    Chen, S.D., Ouyang, X.Y., 2020. Overview of Named Entity Recognition Technology. Radio Communications Technology, 46(3): 251-260 (in Chinese with English abstract).
    Chiu, J. P. C., Nichols, E., 2016. Named Entity Recognition with Bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4: 357-370. https://doi.org/10.1162/tacl_a_00104
    Collobert, R., Weston, J., Bottou, L., et al., 2011. Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research, 12(1): 2493-2537. http://d.wanfangdata.com.cn/periodical/Arxiv000000493885
    Fan, R. Y., Wang, L. Z., Yan, J. N., et al., 2019. Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards. ISPRS International Journal of Geo-Information, 9(1): 15. https://doi.org/10.3390/ijgi9010015
    Hochreiter, S., Schmidhuber, J., 1997. Long Short-Term Memory. Neural Computation, 9(8): 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
    Jiang, B.C., Wan, G., Xu, J., et al., 2018. Geographic Knowledge Graph Building Extracted from Multi-Sourced Heterogeneous Data. Acta Geodaetica et Cartographica Sinica, 47(8): 1051-1061 (in Chinese with English abstract). http://www.zhangqiaokeyan.com/academic-journal-cn_acta-geodaetica-cartographica-sinica_thesis/0201230440688.html
    Kim, Y., 2014. Convolutional Neural Networks for Sentence Classification. Conference on Empirical Methods in Natural Language Processing (EMNLP). The Association for Computational Linguistics, Doha.
    Lafferty, J.D., McCallum, A., Pereira, F., 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco.
    Lample, G., Ballesteros, M., Subramanian, S., et al., 2016. Neural Architectures for Named Entity Recognition. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The Association for Computational Linguistics, San Diego. https://doi.org/10.18653/v1/n16-1030
    Li, C.L., Li, J.Q., Zhang, H.C., et al., 2015. Big Data Application Architecture and Key Technologies of Intelligent Geological Survey. Geological Bulletin of China, 34(7): 1288-1299 (in Chinese with English abstract). http://www.researchgate.net/publication/286100282_Big_data_application_architecture_and_key_technologies_of_intelligent_geological_survey
    Li, L.S., Guo, Y.K., 2018. Biomedical Named Entity Recognition with CNN-BLSTM-CRF. Journal of Chinese Information Processing, 32(1): 116-122 (in Chinese with English abstract). http://europepmc.org/abstract/MED/29718118
    Liu, Y.P., Li, D.D., 2020. Chinese Named Entity Recognition Method Based on Bi-Directional LSTM-CNN-CRF. Journal of Harbin University of Science and Technology, 25(1): 115-120 (in Chinese with English abstract).
    Ma, K., 2018. Research on the Key Technologies of Geological Big Data Representation and Association (Dissertation). China University of Geosciences, Wuhan (in Chinese with English abstract).
    Ma, X. Z., Hovy, E., 2016. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNs-CRF. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). The Association for Computational Linguistics, Berlin. https://doi.org/10.18653/v1/p16-1101
    Qiu, Q. J., Xie, Z., Wu, L., et al., 2019a. GNER: A Generative Model for Geological Named Entity Recognition without Labeled Data Using Deep Learning. Earth and Space Science, 6(6): 931-946. https://doi.org/10.1029/2019ea000610
    Qiu, Q. J., Xie, Z., Wu, L., et al., 2019b. BiLSTM-CRF for Geological Named Entity Recognition from the Geoscience Literature. Earth Science Informatics, 12(4): 565-579. https://doi.org/10.1007/s12145-019-00390-3
    Tan, Y.J., Qu, H.G., Wen, M., 2018. On Big Data of Geological Survey. Geomatics World, 25(2): 7-11 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTotal-CHRK201802003.htm
    Tolle, K. M., Tansley, D. S. W., Hey, A. J. G., 2011. The Fourth Paradigm: Data-Intensive Scientific Discovery. Proceedings of the IEEE, 99(8): 1334-1337. https://doi.org/10.1109/jproc.2011.2155130
    Turian, J.P., Ratinov, L., Bengio, Y., 2010. Word Representations: A Simple and General Method for Semi-Supervised Learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. The Association for Computational Linguistics, Uppsala.
    Wang, C. B., Ma, X. G., Chen, J. G., et al., 2018. Information Extraction and Knowledge Graph Construction from Geoscience Literature. Computers & Geosciences, 112: 112-120. https://doi.org/10.1016/j.cageo.2017.12.007
    Wang, J. M., Hu, Y. J., Joseph, K., 2020. NeuroTPR: A Neuro-Net Toponym Recognition Model for Extracting Locations from Social Media Messages. Transactions in GIS, 24(3): 719-735. https://doi.org/10.1111/tgis.12627
    Yang, Y.Q., 2018. Current Situation, Problems and Countermeasures of Geological Prospecting Units Participate in the "Big Data" Project Construction. Natural Resource Economics of China, 31(7): 31-34 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTOTAL-ZDKJ201807008.htm
    Zhang, G.Y., Fu, J.Y., Ouyang, Z. Z., et al., 2020. The Importance of Space Database Establishment Based on DGSS in Big Data Environment. Earth Science, 45(9): 3451-3460 (in Chinese with English abstract).
    Zhang, M.Z., Yu, M.L., Wang, Y., et al., 2013. Designing and Building the National Geo-Environment Monitoring Data Warehouse. Earth Science, 38(6): 1347-1355 (in Chinese with English abstract). http://www.researchgate.net/publication/289950672_Designing_and_building_the_national_Geo-Environment_Monitoring_data_warehouse
    Zhang, X.Y., Ye, P., Wang, S., et al., 2018. Geological Entity Recognition Method Based on Deep Belief Networks. Acta Petrologica Sinica, 34(2): 343-351 (in Chinese with English abstract). http://www.zhangqiaokeyan.com/academic-journal-cn_acta-petrologica-sinica_thesis/0201252011589.html
    Zhang, X.Y., Zhang, C.J., Wu, M.G., et al., 2020. SpatioTemporal Features Based Geographical Knowledge Graph Construction. Scientia Sinica Informationis, 50(7): 1019-1032 (in Chinese with English abstract). doi: 10.1360/SSI-2019-0269
    Zhao, P.D., 2015. Digital Mineral Exploration and Quantitative Evaluation in the Big Data Age. Geological Bulletin of China, 34(7): 1255-1259 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTOTAL-ZQYD201507001.htm
    Zhao, Y.O., Zhang, J.Z., Li, Y.B., et al., 2020. Sentiment Analysis Using Embedding from Language Model and Multi-Scale Convolutional Neural Network. Journal of Computer Application, 40(3): 651-657 (in Chinese with English abstract). doi: 10.1007/s12652-018-1095-6
    Zhu, Y.Q., Tan, Y.J., Zhang, J.T., et al., 2015. A Framework of Hadoop Based Geology Big Data Fusion and Mining Technologies. Acta Geodaetica et Cartographica Sinica, 44(S1): 152-159 (in Chinese with English abstract). http://www.cqvip.com/QK/90069X/2015B12/670679412.html
    Zuo, R.G., Peng, Y., Li, T., et al., 2020. Challenges of Geological Prospecting Big Data Mining and Integration Using Deep Learning Algorithms. Earth Science, 46(1): 350-358 (in Chinese with English abstract).
    陈曙东, 欧阳小叶, 2020. 命名实体识别技术综述. 无线电通信技术, 46(3): 251-260. doi: 10.3969/j.issn.1003-3114.2020.03.001
    蒋秉川, 万刚, 许剑, 等, 2018. 多源异构数据的大规模地理知识图谱构建. 测绘学报, 47(8): 1051-1061. https://www.cnki.com.cn/Article/CJFDTOTAL-CHXB201808005.htm
    李超岭, 李健强, 张宏春, 等, 2015. 智能地质调查大数据应用体系架构与关键技术. 地质通报, 34(7): 1288-1299. doi: 10.3969/j.issn.1671-2552.2015.07.006
    李丽双, 郭元凯, 2018. 基于CNN-BLSTM-CRF模型的生物医学命名实体识别. 中文信息学报, 32(1): 116-122. doi: 10.3969/j.issn.1003-0077.2018.01.015
    刘宇鹏, 栗冬冬, 2020. 基于BLSTM-CNN-CRF的中文命名实体识别方法. 哈尔滨理工大学学报, 25(1): 115-120. https://www.cnki.com.cn/Article/CJFDTOTAL-HLGX202001018.htm
    马凯, 2018. 地质大数据表示与关联关键技术研究(博士学位论文). 武汉: 中国地质大学.
    谭永杰, 屈红刚, 文敏, 2018. 论地质调查工作大数据. 地理信息世界, 25(2): 7-11. doi: 10.3969/j.issn.1672-1586.2018.02.002
    杨宇谦, 2018. 地勘单位参与"大数据"项目建设的现状、问题及对策. 中国国土资源经济, 31(7): 31-34. https://www.cnki.com.cn/Article/CJFDTOTAL-ZDKJ201807008.htm
    张广宇, 付俊彧, 欧阳兆灼, 等, 2020. 大数据时代下基于DGSS系统下空间数据库建立的重要性. 地球科学, 45(9): 3451-3460. doi: 10.3799/dqkx.2020.130
    张鸣之, 喻孟良, 王勇, 等, 2013. 国家级地质环境数据仓库的设计与实现. 地球科学, 38(6): 1347-1355. doi: 10.3799/dqkx.2013.133
    张雪英, 叶鹏, 王曙, 等, 2018. 基于深度信念网络的地质实体识别方法. 岩石学报, 34(2): 343-351.
    张雪英, 张春菊, 吴明光, 等, 2020. 顾及时空特征的地理知识图谱构建方法. 中国科学: 信息科学, 50(7): 1019-1032. https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202007005.htm
    赵鹏大, 2015. 大数据时代数字找矿与定量评价. 地质通报, 34(7): 1255-1259. doi: 10.3969/j.issn.1671-2552.2015.07.001
    赵亚欧, 张家重, 李贻斌, 等, 2020. 融合基于语言模型的词嵌入和多尺度卷积神经网络的情感分析. 计算机应用, 40(3): 651-657. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202003008.htm
    朱月琴, 谭永杰, 张建通, 等, 2015. 基于Hadoop的地质大数据融合与挖掘技术框架. 测绘学报, 44(S1): 152-159. https://www.cnki.com.cn/Article/CJFDTOTAL-CHXB2015S1023.htm
    左仁广, 彭勇, 李童, 等, 2020. 基于深度学习的地质找矿大数据挖掘与集成的挑战. 地球科学, 46(1): 350-358. doi: 10.3799/dqkx.2020.111
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(3)  / Tables(6)

    Article views (924) PDF downloads(60) Cited by()
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return