The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition
Paper: Link
Objective: Leverage free, noisy data from the web to train effective models of fine-grained recognition.
Summary:
- Interesting paper on using noisy data from the web.
- They sample images directly from Google search, using all returned images as images for a given category. For L-Bird and L-Butterfly, queries are for the scientific name of the category, and for L-Aircraft and L-Dog queries are simply for the category name (e.g.“Boeing 737-200” or “Pembroke Welsh Corgi”).
- Active learning-based approach to collect the data.
- The active learning begins by training a classifier on a seed set of input images and labels (i.e.the Stanford Dogs training set), then proceeds by iteratively picking a set of images to annotate, obtaining labels with human annotators, and re-training the classifier.
- Inception V3 is the base classifier
- To avoid images overlap between GT and web images, aggressive duplication procedure with all ground truth test sets and their corresponding web images is performed using a SOTA for learning similarity metric between images.
Questions:
- How reliable are the search results for a single category from web, otherwise there are a lot of False Positives being introduced.
- Is it limited to extracting category information or do we need extra labels like position, time etc.
- How much human annotators are required to be involved or are they even involved?
- Since algo queries user to label data from time to time; does this query apply on a subset of the images to verify the annotations or create annotations.