Deep papers: A survey on Image Data Augmentation for Deep Learning part 1


data augmentation comes in handy when we have a small number of training samples or unbalanced datasets. in this survey, Connor Shorten and Taghi M. Khoshgoftaar discussed data augmentation techniques and how they affect model performance. we are going to test those technics with a small dataset with two classes slightly unbalanced. authors mainly break down augmentation techniques into two branches, which are basic image manipulations and deep learning approaches. in the first part, we only discuss basic image manipulations. those techniques are also known as geometric transformations.

Why augmentations?

maybe you think why augmentation, we can try to gather more training samples. that's not the optimal solution always. sometimes it's. but when gathering more training samples will be more expensive, maybe there is some level of rarity issues that prevent gathering more training samples and also there may be privacy issues, maybe health issues like CT scanning likewise, gathering more training samples always not the best way to go. in such cases data augmentation really helpful.

Benefits

first things first, you can increase your training samples effectively. we all know deep neural networks need lots of data. in case I mention deep neural networks here because that's the go-to solution for computer vision problems these days. if you don't have enough data your model may overfit or even it's not overfitted it performs poorly because it is trained on far fewer scenarios that can occur in real situations. data basic data augmentation technics are powerful enough to solve this problem by increasing training samples.

sometimes sample size not be the case but possible combinations. let's say you have a dataset of humans and there are too few samples that humans are captured by side angles. when a model trained on that data set in the inference that has very little possibility to successfully identify a human in an image who captured inside angle. likewise, we need as much as a possible combination to train our models. then those models can perform better. also, a very common situation is models trained on images that have identical angles which also can affect model performance. these kinds of problems also can solve easily with augmentation. in classification and detecting models most of the time models are unable to perform well when objects are in different positions rather than the centre of the frame. that's happening because of positional bias. that's mean when models are trained on perfectly centred training images it's difficult to do classification or identification when objects are off from the centre of the frame. this kind of positional bias also can solve with translation augmentation techniques.

How augmentations works

images are three-dimensional metrics in another way 2d-tensors. wich consists of width, height and colour channels as depth. all basic augmentations are done by manipulating those metrics. let's say we need to flip an image left to right, let's see what happens to metric.

left - image matrix, right - image, top row - original and bottom row - flipped

you can see that after flipping how image matrics change by changing position. in other manipulation such as colour space transformation, cropping, rotation, translations and random noise injection all are manipulations of the image matrix.

Augmentations

we can put basic image augmentation methods into six categories. which are flip, colour space transformations, cropping, rotation, translation, and noise injection. let us see how each augmentation works and compare their contribution to increasing model performance by comparing with baseline model performance. in here baseline model means model trained on the original dataset without any augmentation technique applied.

flipping

flip can be divided into two subgroups, vertical and horizontal flip. in vertical flip image turn up-side-down and in horizontal flip image turn right to left. there is a drawback in flip. which is, let's say in digit recognizer if we apply any kind of flip above mention post augmented result may loss label information, in other words, this method not label-preserved in some datasets but can benefit in other. that's a thing to care about before applying flip augmentations.

horizontal flip

vertical flip

colourspace manipulation

there are few augmentation methods in colour space manipulations. basically, they can divide into two groups, colour schema changes and image property adjustments. in colour schema changes we can change RGB colour schema to other colour schemas. not only RGB but others also. in image property adjustments colour channel isolation, brightness, contrast, gamma, hue and saturation adjustment. we treated these image property adjustments as a whole group in this comparison with the baseline model.

colour channel isolation




isolate RGB channels by replacing 0's in others

brightness adjustments


brightness adjustments

brightness adjustments result in different lighting effects which are very useful if our dataset doesn't have images that take in different light conditions. light conditions refer to brighter and darker light conditions.

contrast adjustment


contrast adjustment

by definition, colour contrast is changing of appearance of one colour when surrounded by other colours. as you can see above image shows when negative contrast factors result in a negative effect. this method will help to identify certain colours in images.

hue adjustments


hue adjustments

by adjusting hue we can generate different colour casting effects which are helpful to create different lighting effects. most low-quality cameras and low-quality smartphone cameras have a certain blue colour tone so when our models run on certain devices it will help to train on different colour cast images.

saturation adjustments


saturation adjustment

when saturation increases colour intensity increases and when decrease colours get pale. this is helpful to create different colour intensity effects in our dataset.

gamma transformation


gamma correlation transformation

this is a colour correction method used to reproduce colours accurately images which not properly corrected.

change colour schema


colour schema changes have to be done because maybe colours are not an effective factor when it comes to inference. as a rule of thumb if the naked human eye can classify or identify objects or things without RGB colour CNN's also able to. changing RGB to greyscale will reduce computational needs because it reduces 3 deep colour channels into 1 deep colour channel. also, there may be a need to change the colour schema to identify certain characteristics of an image.

RGB to GrayScale

RGB to BGR

RGB to YCbCr

Cropping

cropping is a very handy augmentation method and also a bit dangerous because it will sometimes destroy label information. mainly cropping can use in two ways, central cropping and random cropping. central crop cut a specified amount of area by anchoring the centre of the image. random cropping is different. it cut randomly from any position. cropping has a few benefits, it reduces positional bias and also with this method we train on multi-scale by using resize along with the crop.

central crop

random crop

Rotation

by the rotating images, we can generate new images with random angles which provide more possible scenarios to our dataset to generalize its performance. as the authors mention in the original paper recognition task such as handwriting rotation is effective in 1-20other than that range not effective.

rotation


Translation

the translation is consist of moving images up, down, left and right inside the frame. this results in different positions for the model to train. the gap is created by moving images that can be filled with constants (0's or 255's) or nearest values. other than that TensorFlow has a few other options as well.

translations

How these techniques affected model training 

how each augmentation technique affected model training (VGGNet16)
left - validation accuracy, right - validation lost

in inference with hold out data,

How to use augmentation in training

actually, it can do in two ways. one is applying augmentation on the fly and the other is creating a dataset by applying augmentations. the first method is better if you have high computational power because it will take a lot of computational resources when your images are high resolution. if it's not you can create a dataset from the original dataset by randomly applying augmentation.

apply augmentation while training

there are two ways to do this. one is using ImageDataGenerator class in Keras preprocessing module, by the way, it's most popular and easy. but if you want more control over your augmentations you can create your own generator using Keras subclass API. the second method is using Keras preprocessing layers directly into the model. one benefit of the second method is when you import your model augmentation layers also imported automatically. so you don't need to recreate the same logic again.

image data generator class

using preprocessing layers

apply augmentation and create a dataset

by using the TensorFlow image module and addons module we can create a program to generate a randomly augmentation applied dataset from the original dataset.


References 

Comments

Post a Comment