YAN Shuo-yue, WANG Qing, ZHONG Kang, ZHANG Chang-min, YE Mao-lin, FU An-qi, LIU Yuan-gang
Download PDF (34)
HTML
(130)
Automated precise identification of rivers in high-resolution remote sensing images holds significant importance and research value in river and lake environmental monitoring, as well as watershed change studies. However, due to the relatively small area occupied by rivers in the images, it can lead to an imbalance between positive and negative samples in the dataset. Additionally, the morphological variability and complex scale transformations inherent in rivers contribute to challenges in river identification, resulting in issues such as discontinuous boundaries and grid effects. In response to these challenges, this paper proposes a cross-scale river precise identification method with fusion of global multilevel features. The method can be divided into three main parts. Firstly, we construct a multi-feature river dataset by selecting globally distinctive meandering and braided rivers to enhance data diversity. Secondly, we construct the R-Seg model, utilizing the lightweight semantic segmentation model Segformer as the backbone network. We design the Global and Adaptive Scale Pyramid Pooling (GASPP) module for extracting multi-scale features. This module, coupled with Transformers, facilitates the extraction of multi-scale features, enabling the model to capture contextual information in river images, reduce information loss, and amplify global dimension interaction features. Lastly, we propose a cross-scale river image prediction method based on mask-weighted voting. By employing sliding window cropping on large-scale river images, we obtain sub-prediction results by multiplying each unit prediction block with a specific mask weight. These results are then sequentially concatenated through overlapping voting, achieving precise identification of river images at different scales. The experiments demonstrate that, in the constructed multi-feature dataset encompassing meandering and braided rivers, a comparative analysis with other methods reveals the following: qualitatively, the overall structure of the R-Seg network ensures high identification accuracy for main rivers and effectively mitigates interruptions in smaller river flows, smoothing river boundaries with good robustness for 500×500 small-scale river image identification. Moreover, the use of mask-weighted voting method significantly reduces the edge loss problem caused by grid effects in unit blocks, making full use of unit block prediction results, improving river prediction accuracy for larger scenes, and achieving accurate identification of river images of different scales. From a quantitative perspective, the method achieves an overall accuracy of 99.49% with optimal performance across various accuracy evaluation metrics. Also, the single-image identification time is less than 1 second, meeting the efficiency requirements of most practical applications. Furthermore, the mask-weighted voting strategy exhibits an overall higher river identification accuracy of approximately 0.28% to 6.93% compared to a pure overlap prediction strategy. By adjusting the overlap parameter, it is observed that accuracy and overlap are not positively correlated; an accuracy of approximately 12.5% achieves relative optimization. This approach, through the design of the R-Seg network model and the introduction of the mask-weighted voting prediction method, effectively alleviates issues such as discontinuity in river boundary recognition and grid effects. It significantly enhances the accuracy of river identification in remote sensing images across diverse scenarios, demonstrating strong robustness and visual performance. The identification outcomes hold crucial application value in geological exploration of rivers and studies on watershed changes.