Image Clutter and Proto-objects Segmentation

Figure 1. First row: original images; Second row: parameter-free proto-object segmentations; Third row: proto-objects filled with mean member-pixel color. Our algorithm runs in under 20 seconds for an 800x600 image on an Intel core i7 3.0 Ghz.


Abstract (NIPS 2013):

Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects defined as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric graph partitioning method for clustering superpixels by modeling mixture of Weibulls on Earth Mover's Distance (EMD) statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new 90-image dataset of real world scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman's rho = 0.8038), but also outperformed all existing clutter perception models and even a behavioral obects segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features.



Superpixel Graph: An image is first pre-procesed into superpixels (we used SLIC [1] for our experiments), then it is formulated into a graph, where the nodes are the superpixels and the edges are weighted by EMD.

Figure 2. Left: original image, Middle: applied SLIC with k = 1000, Right: superpixel graph.

Edge Labeling for Superpixel Clustering: Each edge is labeled as within-cluster (similar), or between-cluster (dissimilar), based on a similarity-threshold gamma. The between-cluster edges are removed to form superpixel clusters, which are merged as proto-objects.

Figure 3. From left to right: EMD weighted graph; after removal of between-cluster edges; merged superpixel clusters; and the final proto-objects (each cluster is filled with the mean pixel color).

Automatic gamma computation using Weibull-Mixture-Model (WMM): EMD is identical to Mallow's Distance, when both histograms have equal mass [2], and Lp-based distance statistics follow Weibull distribution [3]. Therefore, a two-component WMM can be used to model the two latent groups of the EMD statistics (similar vs dissimilar). The between-cluster edges are the ones with weights higher than the cross-point between the two Weibull components.

Figure 4. The Mixture plots (individual components in red, posterior in blue).

Normalized clutter measure: The count of the final proto-objects are divided by the initial # of superpixels to produce our final clutter measure for a given image.


Clutter Dataset:

   Stony Brook University Real-world Clutter Dataset (SBU-RwC90)
      - 90 images, 800x600 resolution. All sampled from the SUN09 Dataset (
      - 6 groups of 15 images each: group 1 has 1~10 objects, group 2 has 11~20 objects, up to group 6 that has
        51~60 objects.
      - Object segmentations by human subjects for all 90 images are provided as part of SUN09.
      - Clutter rankings from 15 human subjects are provided, experiements conducted at SBU. Ground truth clutter
        rating of each image is its median ranked position by the human raters. Mean correlation between all pairs
        of human rankings = 0.6919 (Spearman's rho, p < 0.001).



   [1] R. Achanta, A. Shaji, L. Smith, A. Lucchi, P. Fua, and S. Susstrunk. SLIC superpixels compared to state-of-the-art
   superpixel methods. IEEE TPAMI, 2012.

   [2] E. Levina and P. Bickel. The earth mover's distance is the mallows distance: some insights from statistics. In ICCV,

   [3] G. J. Burghouts, A. W. M. Smeulders, and J.-M. Geusebroek. The distribution family of similarity distances. In NIPS,

   [4] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to
   zoo. In CVPR, 2010.