Title: DCAI-CLUD: a data-centric framework for the construction of land-use datasets

DCAI-CLUD-fig

Abstract

A high-quality land-use dataset is crucial for constructing a high-performance land-use classification model. Due to the complexity and spatial heterogeneity of land-use, the dataset construction process is inefficient and costly. This challenge affects the quality of datasets, consequently impacting the model’s performance. The emerging field of Data-Centric Artificial Intelligence (DCAI) is expected to deliver techniques for dataset optimization, offering a promising solution to the problem. Therefore, this study proposes a data-centric framework named DCAI-CLUD for the construction of land-use datasets. Based on this framework, the accuracy and rate of data labeling are improved by 5.93 and 28.97%. The Gini index of the dataset and the proportion of samples with non-mixed land-use categories are enhanced by 3.27 and 8.52%. The overall accuracy (OA) and Kappa of the land-use classification model improved significantly by 27.87 and 58.08%. This study is the first to introduce DCAI into the field of geographic information and remote sensing and verify its effectiveness. The proposed framework can effectively improve the construction efficiency and quality of the dataset and synchronously optimize the model performance. Based on the proposed framework, we constructed a multi-source land-use dataset of major cities in China named CN-MSLU-100K.

Keywords

Land-use classification;
data-centric artificial intelligence;
point-of-interest;
remote sensing image;
multi-source data fusion

Highlights

  1. A framework for optimizing the land-use dataset construction process is proposed.
  2. Filtering and pre-labeling improved the quality and efficiency of data labeling.
  3. The performance of land-use classification model is enhanced by dataset optimization.
  4. Preconceived results have a subjective impact on the data labelers.
  5. The first study to introduce DCAI for land-use classification is launched.

Full Text Download

International Journal of Geographical Information Science

Dataset Description

(Chinese Version) CN-MSLU-100K:可支持多源时空大数据的地块(社区)尺度全国土地利用类别数据集

(English Version) CN-MSLU-100K: Land Use Classification Dataset at Block Scale for Multi-source Spatio-temporal Data

Supplementary Materials

Q.E.D.