The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

Paper: Link

Objective: Leverage free, noisy data from the web to train effective models of fine-grained recognition.


  • Interesting paper on using noisy data from the web.
  • They sample images directly from Google search, using all returned images as images for a given category. For L-Bird and L-Butterfly, queries are for the scientific name of the category, and for L-Aircraft and L-Dog queries are simply for the category name (e.g.“Boeing 737-200” or “Pembroke Welsh Corgi”).
  • Active learning-based approach to collect the data.
  • The active learning begins by training a classifier on a seed set of input images and labels (i.e.the Stanford Dogs training set), then proceeds by iteratively picking a set of images to annotate, obtaining labels with human annotators, and re-training the classifier.
  • Inception V3 is the base classifier
  • To avoid images overlap between GT and web images, aggressive duplication procedure with all ground truth test sets and their corresponding web images is performed using a SOTA for learning similarity metric between images.


  • How reliable are the search results for a single category from web, otherwise there are a lot of False Positives being introduced.
  • Is it limited to extracting category information or do we need extra labels like position, time etc.
  • How much human annotators are required to be involved or are they even involved?
  • Since algo queries user to label data from time to time; does this query apply on a subset of the images to verify the annotations or create annotations.
Srishti Yadav
Srishti Yadav
Research Assistant

My research interest include applying computationally intensive machine learning algorithm to computer vision algoritmns.