Author(s)

Siddh Vyas, Dr. Neelam Jain

  • Manuscript ID: 120220
  • Volume 2, Issue 4, Apr 2026
  • Pages: 85–92

Subject Area: Data Science and Big Data

DOI: https://doi.org/10.5281/zenodo.19415020
Abstract

Photographed images of unmanned aerial vehicles often have high densities of spatial clusters of objects (pedestrians and vehicles) at a point. The individual objects are difficult to detect in these scenes because of the variation in scale, occlusion and complicated backgrounds. In most aerial activity monitoring projects, it is more beneficial to determine dense areas of activity, as opposed to finding the individual object instances. This paper focuses on a cluster level detection formulation of aerial images. The model does not detect any individual objects but rather detects spatial clusters indicating dense clusters of objects around the object. The clusters are represented by only one bounding box known as a single detection class. This helps in examining the effect of the Feature Pyramid Network structures on the performance of cluster detection models. Two EfficientDet detection architectures are compared; a single-pass pyramid design that estimates a classic Feature Pyramid Network, and a stacked Bidirectional Feature Pyramid Network design. The other elements of the detection system are fixed to allow control of an architectural ablation. It takes place to test the hypothesis with experiments based on manually re-annotated imagery based on the VisDrone aerial dataset. COCO-style metrics of performance on models are based on conceptually estimation of performance through the computational complexity and floating point operations. The findings will conclude on providing quantitative improvements in cluster level aerial detection with respect to iterative bidirectional multi-scale feature fusion.

Keywords
Aerial Object DetectionCluster DetectionFeature Pyramid Network (FPN)Bidirectional Feature Pyramid Network (BiFPN)EfficientDetMulti-scale Feature FusionVisDrone Dataset.