fastai.transforms

Introduction and Overview

The fastai transforms pipeline for images is designed to convert your images into a form ready to be batched by your DataLoader and passed to your model. It can be used to conveniently create transformations like normalization/denormalization and various augmentations. The module contains different predefined pipelines that you can almost directly pass to your data object. This is especially convenient for working with pretrained models, because the images should be normalized according to the statistics of the (pre)training data set.

Normalization

One of the most common image transformations is normalizing the images. The transforms module provides predefined pipelines in which you can pass the statistics (mean and standard deviation of each channel) of your training data.

Important

Make sure you use only the statistics of the training data to normalize both training and validation data.

Normalizing for Pretrained Models

The statistics for all pretrained models available via the fastai.conv_learner module are predefined and accesible via the tfms_from_model function. Here’s an example of how to use this function to create a pipeline for a pretrained resnet34 model:

from fastai.transforms import *
from fastai.dataset import *

arch = resnet34 
sz = 224

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))

The tfms_from_model function returns separate image transformers for the training and validation set. It calls the tfms_from_stats function for custom image statistics and passes the mean and the standard deviation of the imagenet dataset.

fastai.transforms.tfms_from_model(f_model, sz, aug_tfms=None, max_zoom=None, pad=0, crop_type=<CropType.RANDOM: 1>, tfm_y=None, sz_y=None, pad_mode=2, norm_y=True, scale=None)

Returns separate transformers of images for training and validation. Transformers are constructed according to the image statistics given by the model. (See tfms_from_stats)

Arguments:
f_model: model, pretrained or not pretrained

Normalizing with Custom Statistics

If you train a model from scratch you can use the tfms_from_stats function and pass in the statistics of your training set. For each channel you have to define mean and standard deviation:

from fastai.transforms import *
from fastai.dataset import *

stats = A([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) 
sz = 224

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_stats(stats, sz))
fastai.transforms.tfms_from_stats(stats, sz, aug_tfms=None, max_zoom=None, pad=0, crop_type=<CropType.RANDOM: 1>, tfm_y=None, sz_y=None, pad_mode=2, norm_y=True, scale=None)

Given the statistics of the training image sets, returns separate training and validation transform functions

Adding Augmentation

Besides normalization this module also contains transformations that can be used for image augmentation. Each of those transformations is defined as a class (see Available Transformations below). The functions tfms_from_model and tfms_from_stats can take lists of those classes as definitions which augmentations to perform.

Predefined Augmentations

The module contains predefined sets of transformations that can be used for common image augmentations:

  • transforms_basic contains a random rotation up to 10° and random lighting change up to 5%
  • transforms_side_on contains transforms_basic and a random flip (suitable for pictures of “every day objects”)
  • transforms_top_down contains transforms_basic and a random rotation by multiples of 90° (for example suitable for satellite images)

To use those predefined sets of augmentations in tfms_from_model and tfms_from_stats you have to pass them to aug_tfms:

from fastai.transforms import *
from fastai.dataset import *

stats = A([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) 
sz = 224

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_stats(stats, sz, aug_tfms=transforms_basic))

Custom Augmentations

To define custom augmentations you can simply pass your combination of transformations to aug_tfms as a list. All available transformations and how to define custom transformations is described in the next section.

Available Transformations

Each transformation is defined as a class inherited from the abstract parent class Transform. To define a transformation overwrite the abstract method do_transform. A transform may include a random component. If it does, it will often need to transform y using the same random values as x (e.g. a horizontal flip in segmentation must be applied to the mask as well). Therefore, the method set_state is used to ensure all random state is calculated in one place.

class fastai.transforms.Transform(tfm_y=<TfmType.NO: 1>)

A class that represents a transform.

All other transforms should subclass it. All subclasses should override do_transform.

tfm_y : TfmType
type of transform
do_transform(x, is_y)
set_state()

There’s also a Class CoordTransform inherited from Transform which is used to represent all coordinate based transformations like cropping, rotating and scaling.

class fastai.transforms.CoordTransform(tfm_y=<TfmType.NO: 1>)

A coordinate transform.

static make_square(y, x)
map_y(y0, x)
transform_coord(x, ys)

The currently implemented transformations are:

class fastai.transforms.Normalize(m, s, tfm_y=<TfmType.NO: 1>)

Normalizes an image to zero mean and unit standard deviation, given the mean m and std s of the original image

class fastai.transforms.Denormalize(m, s)

De-normalizes an image, returning it to original format.

class fastai.transforms.ChannelOrder(tfm_y=<TfmType.NO: 1>)

changes image array shape from (h, w, 3) to (3, h, w). tfm_y decides the transformation done to the y element.

class fastai.transforms.AddPadding(pad, mode=2, tfm_y=<TfmType.NO: 1>)

A class that represents adding paddings to an image.

The default padding is border_reflect Arguments ———

pad : int
size of padding on top, bottom, left and right
mode:
type of cv2 padding modes. (e.g., constant, reflect, wrap, replicate. etc. )
class fastai.transforms.CenterCrop(sz, tfm_y=<TfmType.NO: 1>, sz_y=None)

A class that represents a Center Crop.

This transforms (optionally) transforms x,y at with the same parameters. Arguments ———

sz: int
size of the crop.
tfm_y : TfmType
type of y transformation.
class fastai.transforms.RandomCrop(targ_sz, tfm_y=<TfmType.NO: 1>, sz_y=None)

A class that represents a Random Crop transformation.

This transforms (optionally) transforms x,y at with the same parameters. Arguments ———

targ: int
target size of the crop.
tfm_y: TfmType
type of y transformation.
class fastai.transforms.NoCrop(sz, tfm_y=<TfmType.NO: 1>, sz_y=None)

A transformation that resize to a square image without cropping.

This transforms (optionally) resizes x,y at with the same parameters. Arguments:

targ: int
target size of the crop.

tfm_y (TfmType): type of y transformation.

class fastai.transforms.Scale(sz, tfm_y=<TfmType.NO: 1>, sz_y=None)

A transformation that scales the min size to sz.

Arguments:
sz: int
target size to scale minimum size.
tfm_y: TfmType
type of y transformation.
class fastai.transforms.RandomScale(sz, max_zoom, p=0.75, tfm_y=<TfmType.NO: 1>, sz_y=None)

Scales an image so that the min size is a random number between [sz, sz*max_zoom]

This transforms (optionally) scales x,y at with the same parameters. Arguments:

sz: int
target size
max_zoom: float
float >= 1.0
p : float
a probability for doing the random sizing
tfm_y: TfmType
type of y transform
class fastai.transforms.RandomRotate(deg, p=0.75, mode=2, tfm_y=<TfmType.NO: 1>)

Rotates images and (optionally) target y.

Rotating coordinates is treated differently for x and y on this transform.

Arguments:
deg (float): degree to rotate. p (float): probability of rotation mode: type of border tfm_y (TfmType): type of y transform
class fastai.transforms.RandomDihedral(tfm_y=<TfmType.NO: 1>)

Rotates images by random multiples of 90 degrees and/or reflection. Please reference D8(dihedral group of order eight), the group of all symmetries of the square.

class fastai.transforms.RandomFlip(tfm_y=<TfmType.NO: 1>, p=0.5)
class fastai.transforms.RandomLighting(b, c, tfm_y=<TfmType.NO: 1>)
class fastai.transforms.RandomRotateZoom(deg, zoom, stretch, ps=None, mode=2, tfm_y=<TfmType.NO: 1>)

Selects between a rotate, zoom, stretch, or no transform. Arguments:

deg - maximum degrees of rotation. zoom - maximum fraction of zoom. stretch - maximum fraction of stretch. ps - probabilities for each transform. List of length 4. The order for these probabilities is as listed respectively (4th probability is ‘no transform’.
class fastai.transforms.RandomStretch(max_stretch, tfm_y=<TfmType.NO: 1>)
class fastai.transforms.PassThru(tfm_y=<TfmType.NO: 1>)
class fastai.transforms.RandomBlur(blur_strengths=5, probability=0.5, tfm_y=<TfmType.NO: 1>)

Adds a gaussian blur to the image at chance. Multiple blur strengths can be configured, one of them is used by random chance.

class fastai.transforms.Cutout(n_holes, length, tfm_y=<TfmType.NO: 1>)
class fastai.transforms.GoogleNetResize(targ_sz, min_area_frac=0.08, min_aspect_ratio=0.75, max_aspect_ratio=1.333, flip_hw_p=0.5, tfm_y=<TfmType.NO: 1>, sz_y=None)

Randomly crops an image with an aspect ratio and returns a squared resized image of size targ

Arguments:
targ_sz: int
target size
min_area_frac: float < 1.0
minimum area of the original image for cropping
min_aspect_ratio : float
minimum aspect ratio
max_aspect_ratio : float
maximum aspect ratio
flip_hw_p : float
probability for flipping magnitudes of height and width
tfm_y: TfmType
type of y transform

Note

Transformations are often run in multiple threads. Therefore any state must be stored in thread local storage. The Transform class provide a thread local store attribute for you to use. See RandomFlip class for an example of how to use random state safely in Transform subclasses.

The class TfmType defines the type of transformation:

class fastai.transforms.TfmType

Type of transformation. Parameters

IntEnum: predefined types of transformations

NO: the default, y does not get transformed when x is transformed. PIXEL: x and y are images and should be transformed in the same way.

Example: image segmentation.

COORD: y are coordinates (i.e bounding boxes) CLASS: y are class labels (same behaviour as PIXEL, except no normalization)