We introduce an approach for learning to downscale high-resolution images for segmentation tasks. The main motivation is to adapt the sampling budget to the difficulty of segmented pixels/regions. We show that learning the spatially varying downsampling strategy jointly with segmentation offers advantages in segmenting large images with a limited computational budget.
Different human experts provide their estimates of the “true” segmentation labels under the influence of their own biases and competence levels. Treating these noisy labels blindly as the ground truth limits the performance that automatic segmentation algorithms can achieve. In this work, we present a method for jointly learning, from purely noisy observations alone, the reliability of individual annotators and the true segmentation label distributions, using two coupled CNNs.
On top of prior foveation work at MICCAI 2020, in this extended version, we further introduce the more computational efficient hard-gated categorical sampling of FoV-resolution patch configurations at each location and provide two differentiable solutions. We demonstrate the generical applicability on three vision datasets including Cityscapes, DeepGlobe aerial image and Gleason2019 histopathology dataset.
Segmenting ultra-high resolution images often needs empirical decisions on the trade-off patch configuration between field-of-view (FoV) (i.e., spatial coverage) and the image resolution. We introduce the foveation module, a jointly learnable “dataloader” which, for a given ultra-high resolution image, adaptively chooses the appropriate configuration (FoV/resolution trade-off) of the input patch to feed to the downstream segmentation model at each spatial location of the image.