Low/Few Shot Learning
Question: If a class has only two samples, can a computer make correct prediction?
Note: Number of samples is too less for training.
Approach: Few Shot Learning
Few shot learning is a problem where we try to learn when the training data is very small. It is different from supervised learning where we train on some data and try to predict an object which belongs to a class present in training data. In few shot learning, the training data has never the sample we are predicting on.
Le’s take an example. Look at Fig.1
We train a model of a big training set of cats, dogs, mug and hat. The goal in few shot learning is not to recognize unseen cats/dogs/mugs/hats. Instead the goal is to recognize the similarity and difference between objects. After training, if we show two pairs of images (Fig.2 and Fig.3) to the model and ask “Are the two images similar?".
Since the model has learned the similarity and difference between objects, it can tell that images in Fig.2 are same kind of objects and Fig.3 are quite different. Now, if you ask the model to recognize the objects, it does not know it is cat or dog because it isn’t a part of the training data. Hence, model can tell that these two images are similar but can’t tell what they are.
Now, let’s ask another question. Let’s ask the model what is Fig. 4. We call this image: Query
Model is unable to answer this question because it has never seen this data during training. Now, I show another 4 images to the model as shown in Fig. 5
Now, model compares the query image with each image in the support set and believes the query is “Squirre”. These set of labelled images are called “Support Set”.
Jargons related to few-shot learning
- k-way: we are given $j$ classes (e.g. 3-way means 3 classes. See Fig.1)
- n-shot: the number of samples per class (e.g. 2-shot mean 2 sample per class. See Fig.3)
- Support set($S$): the samples used for learning. If we have $j$ classes and $k$ samples, then number of elements in support set is $ j * k $.
- Query set ($Q$): the sample(s) which we are trying to identify.
More formally,if we define task with $T$, then task $T_i$ is sampled from probabilty distribution over tasks
- Datasets for Fine-Grained Image Classification
- The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition
- Hyperopt: A tool for parameter tuning
- Presence-Only Geographical Priors for Fine-Grained Image Classification
- We Have So Much In Common: Modeling Semantic Relational Set Abstractions In Videos