Data

Generic Interfaces

Dataset

class Dataset(data, transform=None)[source]

Generic dataset to handle dictionary format data, it can operate transforms for specific fields. For example, typical input data can be a list of dictionaries:

[{                            {                            {
     'img': 'image1.nii.gz',      'img': 'image2.nii.gz',      'img': 'image3.nii.gz',
     'seg': 'label1.nii.gz',      'seg': 'label2.nii.gz',      'seg': 'label3.nii.gz',
     'extra': 123                 'extra': 456                 'extra': 789
 },                           },                           }]
Parameters
  • data (Iterable) – input data to load and transform to generate dataset for model.

  • transform (Callable, optional) – transforms to excute operations on input data.

Patch-based dataset

GridPatchDataset

class GridPatchDataset(dataset, patch_size, start_pos=(), pad_mode='wrap', **pad_opts)[source]

Yields patches from arrays read from an input dataset. The patches are chosen in a contiguous grid sampling scheme.

Initializes this dataset in terms of the input dataset and patch size. The patch_size is the size of the patch to sample from the input arrays. Tt is assumed the arrays first dimension is the channel dimension which will be yielded in its entirety so this should not be specified in patch_size. For example, for an input 3D array with 1 channel of size (1, 20, 20, 20) a regular grid sampling of eight patches (1, 10, 10, 10) would be specified by a patch_size of (10, 10, 10).

Parameters
  • dataset (Dataset) – the dataset to read array data from

  • patch_size (tuple of int or None) – size of patches to generate slices for, 0/None selects whole dimension

  • start_pos (tuple of it, optional) – starting position in the array, default is 0 for each dimension

  • pad_mode (str, optional) – padding mode, see numpy.pad

  • pad_opts (dict, optional) – padding options, see numpy.pad

Nifti format handling

Reading

class NiftiDataset(image_files, seg_files=None, labels=None, as_closest_canonical=False, transform=None, seg_transform=None, image_only=True, dtype=None)[source]

Loads image/segmentation pairs of Nifti files from the given filename lists. Transformations can be specified for the image and segmentation arrays separately.

Initializes the dataset with the image and segmentation filename lists. The transform transform is applied to the images and seg_transform to the segmentations.

Parameters
  • image_files (list of str) – list of image filenames

  • seg_files (list of str) – if in segmentation task, list of segmentation filenames

  • labels (list or array) – if in classification task, list of classification labels

  • as_closest_canonical (bool) – if True, load the image as closest to canonical orientation

  • transform (Callable, optional) – transform to apply to image arrays

  • seg_transform (Callable, optional) – transform to apply to segmentation arrays

  • image_only (bool) – if True return only the image volume, other return image volume and header dict

  • dtype (np.dtype, optional) – if not None convert the loaded image to this data type

load_nifti(filename_or_obj, as_closest_canonical=False, image_only=True, dtype=None)[source]

Loads a Nifti file from the given path or file-like object.

Parameters
  • filename_or_obj (str or file) – path to file or file-like object

  • as_closest_canonical (bool) – if True, load the image as closest to canonical axis format

  • image_only (bool) – if True return only the image volume, other return image volume and header dict

  • dtype (np.dtype, optional) – if not None convert the loaded image to this data type

Returns

The loaded image volume if image_only is True, or a tuple containing the volume and the Nifti header in dict format otherwise

Note

header[‘original_affine’] stores the original affine loaded from filename_or_obj. header[‘affine’] stores the affine after the optional as_closest_canonical transform.

Writing

write_nifti(data, affine, file_name, target_affine=None, dtype='float32')[source]

Write numpy data into nifti files to disk.

Parameters
  • data (numpy.ndarray) – input data to write to file.

  • affine (numpy.ndarray) – affine information for the data.

  • file_name (string) – expected file name that saved on disk.

  • target_affine (numpy.ndarray, optional) – before saving the (data, affine), transform the data into the orientation defined by target_affine.

  • dtype (np.dtype, optional) – convert the image to save to this data type.

Synthetic

create_test_image_2d(width, height, num_objs=12, rad_max=30, noise_max=0.0, num_seg_classes=5, channel_dim=None)[source]

Return a noisy 2D image with num_obj circles and a 2D mask image. The maximum radius of the circles is given as rad_max. The mask will have num_seg_classes number of classes for segmentations labeled sequentially from 1, plus a background class represented as 0. If noise_max is greater than 0 then noise will be added to the image taken from the uniform distribution on range [0,noise_max). If channel_dim is None, will create an image without channel dimension, otherwise create an image with channel dimension as first dim or last dim.

create_test_image_3d(height, width, depth, num_objs=12, rad_max=30, noise_max=0.0, num_seg_classes=5, channel_dim=None)[source]

Return a noisy 3D image and segmentation.

Utilities

correct_nifti_header_if_necessary(img_nii)[source]

check nifti object header’s format, update the header if needed. in the updated image pixdim matches the affine.

Parameters

img (nifti image object) –

dense_patch_slices(image_size, patch_size, scan_interval)[source]

Enumerate all slices defining 2D/3D patches of size patch_size from an image_size input image.

Parameters
  • image_size (tuple of int) – dimensions of image to iterate over

  • patch_size (tuple of int) – size of patches to generate slices

  • scan_interval (tuple of int) – dense patch sampling interval

Returns

a list of slice objects defining each patch

get_random_patch(dims, patch_size, rand_state=None)[source]

Returns a tuple of slices to define a random patch in an array of shape dims with size patch_size or the as close to it as possible within the given dimension. It is expected that patch_size is a valid patch for a source of shape dims as returned by get_valid_patch_size.

Parameters
  • dims (tuple of int) – shape of source array

  • patch_size (tuple of int) – shape of patch size to generate

  • rand_state (np.random.RandomState) – a random state object to generate random numbers from

Returns

a tuple of slice objects defining the patch

Return type

(tuple of slice)

get_valid_patch_size(dims, patch_size)[source]

Given an image of dimensions dims, return a patch size tuple taking the dimension from patch_size if this is not 0/None. Otherwise, or if patch_size is shorter than dims, the dimension from dims is taken. This ensures the returned patch size is within the bounds of dims. If patch_size is a single number this is interpreted as a patch of the same dimensionality of dims with that size in each dimension.

iter_patch(arr, patch_size, start_pos=(), copy_back=True, pad_mode='wrap', **pad_opts)[source]

Yield successive patches from arr of size patch_size. The iteration can start from position start_pos in arr but drawing from a padded array extended by the patch_size in each dimension (so these coordinates can be negative to start in the padded region). If copy_back is True the values from each patch are written back to arr.

Parameters
  • arr (np.ndarray) – array to iterate over

  • patch_size (tuple of int or None) – size of patches to generate slices for, 0 or None selects whole dimension

  • start_pos (tuple of it, optional) – starting position in the array, default is 0 for each dimension

  • copy_back (bool) – if True data from the yielded patches is copied back to arr once the generator completes

  • pad_mode (str, optional) – padding mode, see numpy.pad

  • pad_opts (dict, optional) – padding options, see numpy.pad

Yields

Patches of array data from arr which are views into a padded array which can be modified, if copy_back is True these changes will be reflected in arr once the iteration completes.

iter_patch_slices(dims, patch_size, start_pos=())[source]

Yield successive tuples of slices defining patches of size patch_size from an array of dimensions dims. The iteration starts from position start_pos in the array, or starting at the origin if this isn’t provided. Each patch is chosen in a contiguous grid using a first dimension as least significant ordering.

Parameters
  • dims (tuple of int) – dimensions of array to iterate over

  • patch_size (tuple of int or None) – size of patches to generate slices for, 0 or None selects whole dimension

  • start_pos (tuple of it, optional) – starting position in the array, default is 0 for each dimension

Yields

Tuples of slice objects defining each patch

list_data_collate(batch)[source]

Enhancement for PyTorch DataLoader default collate. If dataset already returns a list of batch data that generated in transforms, need to merge all data to 1 list. Then it’s same as the default collate behavior. .. note:: Need to use this collate if apply some transforms that can generate batch data.

rectify_header_sform_qform(img_nii)[source]

Look at the sform and qform of the nifti object and correct it if any incompatibilities with pixel dimensions

Adapted from https://github.com/NifTK/NiftyNet/blob/v0.6.0/niftynet/io/misc_io.py