Remote Sensing Image Scene Classification Method Based on Multi-Scale Cyclic Attention Network
-
摘要: 高分辨率遥感影像场景分类一直是遥感领域的研究热点.针对遥感场景对尺度的需求具有多样性的问题,提出了一种基于多尺度循环注意力网络的遥感影像场景分类方法.首先,通过Resnet50提取遥感影像多个尺度的特征,采用注意力机制得到影像不同尺度下的关注区域,对关注区域进行裁剪和缩放并输入到网络.然后,融合原始影像不同尺度的特征及其关注区域的影像特征,输入到全连接层完成分类预测.此分类方法在UC Merced Land-Use和NWPU-RESISC45公开数据集上进行了验证,平均分类精度较基础模型Resnet50分别提升了1.89%和2.70%.结果表明,多尺度循环注意力网络可以进一步提升遥感影像场景分类的精度.Abstract: Scene classification of high-resolution remote sensing images has always been a research hotspot in the field of remote sensing. In view of the diversity of scale requirements of remote sensing scenes, in this paper it proposes a remote sensing image scene classification method based on multi-scale cyclic attention network. Firstly, the features of multiple scales of remote sensing scene image are extracted by Resnet50 network, the attention mechanism is used to obtain the region of interest of the image, and the region of interest is clipped and scaled. Then, the features of different scales of the original image and the features of different scale cropped images are fused, input to the full connection layer for classification prediction. The proposed method is validated in UC Merced Land-Use and NWPU-RESISC45, the average classification accuracy is improved by 1.89% and 2.70% respectively compared with Resnet50.The results show that the multi-scale cyclic attention network can further improve the accuracy of remote sensing image scene classification.
-
Key words:
- remote sensing /
- scene classification /
- multi-scale /
- convolutional neural network /
- attention mechanism
-
表 1 Resnet50网络配置
Table 1. Resnet50 network configuration
layer name 50-layer Conv1 7×7, 64, stride 2 Conv2_x 3×3 Max Pool, stride 2 $ \left[\begin{array}{c}1\times \mathrm{1, 64}\\ 3\times \mathrm{3, 64}\\ 1\times \mathrm{1, 256}\end{array}\right] $ × 3 Conv3_x $ \left[\begin{array}{c}1\times \mathrm{1, 128}\\ 3\times \mathrm{3, 128}\\ 1\times \mathrm{1, 512}\end{array}\right] $× 4 Conv4_x $ \left[\begin{array}{c}1\times \mathrm{1, 256}\\ 3\times \mathrm{3, 256}\\ 1\times \mathrm{1, 1}\mathrm{ }024\end{array}\right] $ × 6 Conv5_x $ \left[\begin{array}{c}1\times \mathrm{1, 512}\\ 3\times \mathrm{3, 512}\\ 1\times \mathrm{1, 2}\mathrm{ }048\end{array}\right] $ × 3 GAP, k-d FC, softmax 表 2 两个数据集的相关信息
Table 2. Information about two datasets
Datasets Scene Images per class Total images Sizes Training rate UC Merced Land-Use 21 100 2 100 256×256 80% NWPU-RESISC45 45 700 31 500 256×256 10% 表 3 基于UC Merced Land-Use不同尺度特征的分类精度
Table 3. Classification accuracy of different scale features on UCM dataset
number scale A-OA (%) 1 S_128_256 97.85$ \pm $0.67 2 S_160_256 98.10$ \pm $0.39 3 S_192_256 98.51$ \pm $0.11 4 S_224_256 98.33$ \pm $00.14 5 S_256 98.18$ \pm $00.09 6 S_288_256 98.10$ \pm $00.39 表 4 基于NWPU-RESISC45不同尺度特征的分类精度
Table 4. Classification accuracy of different scale features on NWPU-RESISC45 dataset
number scale A-OA (%) 1 S_128_256 91.04$ \pm $0.03 2 S_160_256 90.86$ \pm $0.19 3 S_192_256 91.18$ \pm $0.02 4 S_224_256 90.19$ \pm $0.31 5 S_256 90.25$ \pm $0.20 6 S_288_256 90.85$ \pm $0.27 表 5 不同方法对UC Merced Land-Use的分类精度
Table 5. Classification accuracy of different methods for UC Merced Land-Use
Method OA (%) BoVW(Yang and Newsam, 2010) 76.80 GoogleNet(Nogueira et al., 2017) 92.80 CaffeNet(Xia et al., 2017) 95.02$ \pm $0.81 Resnet50(Zhang et al., 2019) 96.62$ \pm $0.26 GLM16(Yuan et al., 2019) 94.97$ \pm $1.16 VGG-VD16+MSCP(He et al., 2018) 98.36$ \pm $0.58 AlexNet + MSCP(He et al., 2018) 97.29$ \pm $0.63 The model of this paper 98.51$ \pm $0.11 表 6 不同方法对NWPU-RESISC45的分类精度
Table 6. Classification accuracy of different methods for NWPU-RESISC45
Method OA (%) BoVW(Cheng et al., 2017) 41.72$ \pm $0.21 Fine-tuned AlexNet(Cheng et al., 2017) 81.22$ \pm $0.19 Fine-tuned GoogleNet (Cheng et al., 2017) 82.57$ \pm $0.12 Fine-tuned VGGNet-16(Cheng et al., 2017) 87.15$ \pm $0.45 Resnet50(Zhao et al., 2020) 88.48$ \pm $0.21 VGG-VD16+MSCP(He et al., 2018) 85.33$ \pm $0.17 AlexNet + MSCP(He et al., 2018) 81.70$ \pm $0.23 The model of this paper 91.18$ \pm $0.02 -
[1] Bahdanau, D., Cho, K., Bengio, Y., 2014. Neural Machine Translation by Jointly Learning to Align and Translate. Computer Science, arXiv: 1409.0473. https://arxiv.org/abs/1409.0473 [2] Castelluccio, M., Poggi, G., Sansone, C., et al., 2015. Land Use Classification in Remote Sensing Images by Convolutional Neural Networks. Acta Ecologica Sinica, 28(2): 627-635. http://pdfs.semanticscholar.org/4191/fe93bfd883740a881e6a60e54b371c2f241d.pdf [3] Chen, Q.H., Liu, Z.M., Liu, X.G., et al., 2010. Element-Oriented Land-Use Classification of Mining Area by High Spatial Resolution Remote Sensing Image. Earth Science, 35(3): 453-458(in Chinese with English abstract). http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=5631116 [4] Chen, S.Z., Tian, Y.L., 2014. Pyramid of Spatial Relatons for Scene-Level Land Use Classification. IEEE Transactions on Geoscience and Remote Sensing, 53(4): 1947-1957. https://doi.org/10.1109/TGRS.2014.2351395 [5] Cheng, G., Han, J., Lu, X., 2017. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proceedings of the IEEE, 105(10): 1865-1883. https://doi.org/10.1109/JPROC.2017.2675998 [6] Cheng, G., Ma, C. C., Zhou, P. C., et al., 2016. Scene Classification of High Resolution Remote Sensing Images Using Convolutional Neural Networks. In Proceedings 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, 767-770. https://doi.org/10.1109/IGARSS.2016.7729193 [7] Cheng, G.X., Niu, R.Q., Zhang, K.X., et al., 2018. Opencast Mining Area Recognition in High-Resolution Remote Sensing Images Using Convolutional Neural Networks. Earth Science, 43(Suppl. 2): 256-262(in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTotal-DQKX2018S2021.htm [8] Fu, J.L., Zheng, H.L., Mei, T., 2017. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu Hawaii, 4476-4484. https://doi.org/10.1109/CVPR.2017.476 [9] Gómez-Chova, L., Tuia, D., Moser, G., et al., 2015. Multimodal Classification of Remote Sensing Images: A Review and Future Directions. Proceedings of the IEEE, 103(9): 1560-1584. https://doi.org/10.1109/JPROC.2015.2449668 [10] Han, X.B., Zhong, Y.F., Cao, L.Q., et al., 2017. Pre-Trained AlexNet Architecture with Pyramid Pooling and Supervision for High Spatial Resolution Remote Sensing Image Scene Classification. Remote Sensing, 9(8): 848. https://doi.org/10.3390/rs9080848 [11] He, K.M., Zhang, X.Y., Ren, S Q., et al., 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas Nevada, 770-778. https://doi.org/10.1109/CVPR.2016.90 [12] He, N.J., Fang, L.Y., Li, S.T., et al., 2018. Remote Sensing Scene Classification Using Multilayer Stacked Covariance Pooling. IEEE Transactions on Geoscience and Remote Sensing, 56(12): 6899-6910. https://doi.org/10.1109/TGRS.2018.2845668 [13] Jia, Y.Q., Shelhamer, E., Donahue, J., et al., 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando Florida USA, 675-678. https://doi.org/10.1145/2647868.2654889 [14] Ketkar, N., 2017. Introduction to PyTorch. Deep Learning with Python. Apress, Berkeley, CA, 195-208. https://doi.org/10.1007/978-1-4842-2766-4_12 [15] Li, G.D., Zhang, C.J., Wang, M.K., et al., 2019. Transfer Learning Using Convolutional Neural Network for Scene Classification within High Resolution Remote Sensing Image. Science of Surveying and Mapping, 44(4): 116-123, 174(in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTotal-CHKD201904021.htm [16] Li, W.K., Zhang, W., Qin, J.H., et al., 2020. "Expansion-Fusion" Extraction of Surface Gully Area Based on DEM and High-Resolution Remote Sensing Images. Earth Science, 45(6): 1948-1955(in Chinese with English abstract). [17] Lienou, M., Maitre, H., Datcu, M., 2009. Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation. IEEE Geoscience and Remote Sensing Letters, 7(1): 28-32. https://doi.org/10.1109/LGRS.2009.2023536 [18] Luo, W., Li, H. L., Liu, G. H., 2011. Automatic Annotation of Multispectral Satellite Images Using Author-Topic Model. IEEE Geoscience and Remote Sensing Letters, 9(4): 634-638. https://doi.org/10.1109/LGRS.2011.2177064 [19] Nogueira, K., Penatti, O. A. B., dos Santos, J.A., 2017. Towards Better Exploiting Convolutional Neural Networks for Remote Sensing Scene Classification. Pattern Recognition, 61: 539-556. https://doi.org/10.1016/j.patcog.2016.07.001 [20] Oliva, A., Torralba, A., 2001. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. International Journal of Computer Vision, 42(3): 145-175. https://doi.org/10.1023/A:1011139631724 [21] Pan, S. J., Yang, Q., 2009. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22(10): 1345-1359. https://doi.org/10.1109/TKDE.2009.191 [22] Simonyan, K., Zisserman, A., 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR. Computer Science, arXiv: 1409.1556. [23] Szegedy, C., Liu, W., Jia, Y.Q., et al., 2015. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, IEEE, 1-9. https://doi.org/10.1109/CVPR.2015.7298594 [24] Xia, G. S., Hu, J. W., Hu, F., et al., 2017. AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification. IEEE Transactions on Geoscience and Remote Sensing, 55(7): 3965-3981. https://doi.org/10.1109/TGRS.2017.2685945 [25] Yang, Y., Newsam, S., 2010. Bag-of-Visual-Words and Spatial Extensions for Land-Use Classification. In Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, San Jose California, 270-279. https://doi.org/10.1145/1869790.1869829 [26] Yang, Y., Newsam, S., 2013. Geographic Image Retrieval Using Local Invariant Features. IEEE Transactions on Geoscience and Remote Sensing, 51(2): 818-832. https://doi.org/10.1109/TGRS.2012.2205158 [27] Yu, D.H., Zhang, B.M., Zhao, C., et al., 2020. Scene Classification of Remote Sensing Image Using Ensemble Convolutional Neural Network. Journal of Remote Sensing, 24(6): 717-727(in Chinese with English abstract). [28] Yu, S.C., Yu, D.Q., Wang, L.C., et al., 2019. Remote Sensing Study of Dongting Lake Beach Changes before and after Operation of Three Gorges Reservoir. Earth Science, 44(12): 4275-4283(in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTotal-DQKX201912037.htm [29] Yuan, Y., Fang, J., Lu, X.Q., et al., 2019. Remote Sensing Image Scene Classification Using Rearranged Local Features. IEEE Transactions on Geoscience and Remote Sensing, 57(3): 1779-1792. https://doi.org/10.1109/TGRS.2018.2869101 [30] Zhang, D., Li, N., Ye, Q.L., 2019. Positional Context Aggregation Network for Remote Sensing Scene Classification. IEEE Geoscience and Remote Sensing Letters, 17(6): 943-947. https://doi.org/10.1109/LGRS.2019.2937811 [31] Zhao, Z.C., Li, J.Q., Luo, Z., et al., 2020. Remote Sensing Image Scene Classification Based on an Enhanced Attention Module. IEEE Geoscience and Remote Sensing Letters, (99): 1-5. https://doi.org/10.1109/LGRS.2020.3011405 [32] 陈启浩, 刘志敏, 刘修国, 等, 2010. 面向基元的高空间分辨率矿区遥感影像土地利用分类. 地球科学, 35(3): 453-458. doi: 10.3799/dqkx.2010.055 [33] 程国轩, 牛瑞卿, 张凯翔, 等, 2018. 基于卷积神经网络的高分遥感影像露天采矿场识别. 地球科学, 43(增刊2): 256-262. doi: 10.3799/dqkx.2018.987 [34] 李冠东, 张春菊, 王铭恺, 等, 2019. 卷积神经网络迁移的高分影像场景分类学习. 测绘科学, 444): 116-123, 174. https://www.cnki.com.cn/Article/CJFDTOTAL-CHKD201904021.htm [35] 李文凯, 张唯, 秦家豪, 等, 2020. 基于DEM和高分辨率遥感影像的"膨胀-融合"式地表沟壑提取. 地球科学, 45(6): 1948-1955. doi: 10.3799/dqkx.2020.004 [36] 余东行, 张保明, 赵传, 等, 2020. 联合卷积神经网络与集成学习的遥感影像场景分类. 遥感学报, 24(6): 717-727. https://www.cnki.com.cn/Article/CJFDTOTAL-YGXB202006006.htm [37] 余姝辰, 余德清, 王伦澈, 等, 2019. 三峡水库运行前后洞庭湖洲滩面积变化遥感认识. 地球科学, 44(12): 4275-4283. doi: 10.3799/dqkx.2019.182