融合对象和多尺度视觉特征的遥感图像描述模型-AET-电子技术应用

融合对象和多尺度视觉特征的遥感图像描述模型

网络安全与数据治理 6期

贾亚敏，陈姣，彭玉青

(河北工业大学人工智能与数据科学学院，天津300401)

摘要： 基于遥感图像多尺度、无法准确提取微小物体、物体类别易混淆的问题，提出了一种融合对象和多尺度视觉特征的遥感图像描述模型(Fusion of Object and Multiscale Visual Feature，FO-MSV)，通过构建的对象提取器分析文本信息，提取其中的对象信息；设计了一种多尺度交互模块，获取遥感图像的多尺度视觉特征，以适应多尺度的特点；为了充分利用对象信息并融合视觉信息，提出了一种新的对象-视觉特征融合机制，调整视觉上下文和对象上下文之间的平衡。基于该领域内三个数据集的实验结果表明，该模型能明显提升描述的性能，与其他先进模型相比具有竞争力。

关键词： 图像描述遥感图像多尺度特征对象信息视觉信息

中国分类号： TP391
文献标识码： A
DOI： 10.19358/j.issn.2097-1788.2022.06.011
引用格式：贾亚敏，陈姣，彭玉青. 融合对象和多尺度视觉特征的遥感图像描述模型[J].网络安全与数据治理，2022，41(6)：78-83，89.

Remote sensing image caption model with fusion of object and multiscale visual feature

Jia Yamin，Chen Jiao，Peng Yuqing

(School of Artificial Intelligence，Hebei University of Technology，Tianjin 300401，China)

Abstract： Aiming at the problems that remote sensing image has multiscale features and the object categories are easy to be confused, cannot accurately extract the tiny objects from images, a new remote sensing image caption model(FO-MSV) is proposed, which analyzes the text information through the constructed object extractor, to extract the object information. A multiscale interaction module is designed to obtain the multiscale visual features of remote sensing images to adapt to the characteristics of multiscale. In order to make full use of object information and fuse visual information, a new object-visual feature fusion mechanism is proposed to adjust the balance between visual context and object context. Experimental results on three datasets show that the proposed model can significantly improve the performance of captions and is competitive compared with other advanced models.

Key words : image caption；remote sensing image；multiscale feature；object information；visual information；feature fusion

0 引言

图像描述是旨在从语义层面上对图像进行总结。遥感图像是利用遥感技术从高空获取的图像，遥感图像描述(Remote Sensing Image Caption，RSIC)是上述两个领域的结合，旨在为指定的遥感图像生成综合性的文本描述，在交通指挥、地理研究等领域[1]具有广泛的应用前景，已成为新兴的研究热点。遥感图像描述的实现最初沿用了图像描述的编码器-解码器模型[2]，随后提出了许多模型来解决不同的问题，多数研究使用卷积神经网络(Convolutional Neural Networks，CNN)作为编码器提取图像特征，但CNN卷积层的输出特征所对应的感受野都是大小和形状相同的均匀网格，因此仅利用CNN提取的图像特征容量有限，难以识别图像中的微小物体，且由于拍摄角度问题，遥感图像中存在一些多义和易混淆物体，不易区分。

为解决上述问题且适应遥感图像场景多尺度的特点，本文提出了融合对象和多尺度视觉特征的遥感图像描述模型(Fusion of Object and Multiscale Visual Feature，FO-MSV)。该模型构建对象提取器(Object Extractor，OE)利用指针生成网络[3]得到的整合描述提取对象信息以避免遗漏微小物体。同时提出了一种新的多尺度交互模块(Multiscale Interaction Module，MSCM)来获取图像的多尺度视觉特征适应多尺度的特点。此外，设计一种新的对象-视觉融合机制(Object-Visual Fusion Mechanism，ovFM)来利用对象信息并融合多尺度视觉信息避免出现识别对象错误的问题，且改善了长短时记忆网络(Long Short Term Networks，LSTM)的结构，称为多输入LSTM(Multi-Input LSTM，I_LSTM)。

本文详细内容请下载：https://www.chinaaet.com/resource/share/2000005064

作者信息：

贾亚敏，陈姣，彭玉青

(河北工业大学人工智能与数据科学学院，天津300401)

微信图片_20210517164139.jpg

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容