Fast-ImageNet-Dataloader
Fast-ImageNet-Dataloader copied to clipboard
A fast data loader for ImageNet on PyTorch.
Install
Requirements:
- Tensorpack: clone and
pip install -e . - LMDB:
pip install lmdb - OpenCV:
pip install opencv-python - Protobuf:
conda install protobuf - Prctl: clone,
sudo apt-get install build-essential libcap-devandpython setup.py build
Tensorpack version > 0.9 is currently NOT supported.
Note that some prebuilt opencv is much slower than others.
Remember to check with this script and make sure it prints < 1s.
Preprocessing
To start, set the environment variable IMAGENET to the ILSVRC2012
dataset. TENSORPACK_DATASET should also be set (for tensorpack).
export IMAGENET='/mnt/work/data/raw-data/'
python preprocess_sequential.py
Usage
train_loader = LMDBLoader('train', batch_size=args.batch_size, num_workers=32, shuffle=True, cuda=True)
valid_loader = LMDBLoader('val', batch_size=args.batch_size, num_workers=32, shuffle=False, cuda=True)
TODO
- [ ] Image Normalization
- [ ] Support HDF5 format
- [ ] Tensorpack version > 0.9
Disclaimer
Code mainly from sequential-imagenet-dataloader, and Tensorpack examples.