mmocr
mmocr copied to clipboard
[Bug] AttributeError: 'NoneType' object has no attribute 'shape' for mjsynth lmdb data
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] The bug has not been fixed in the latest version (0.x) or latest version (1.x).
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
1.x branch https://github.com/open-mmlab/mmocr/tree/dev-1.x
Environment
sys.platform: linux
Python: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0]
CUDA available: False
numpy_random_seed: 2147483648
GCC: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
PyTorch: 2.0.0+cu117
PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.15.1+cu117
OpenCV: 4.7.0
MMEngine: 0.7.3
MMOCR: 1.0.0+unknown
Reproduces the problem - code sample
The mjsynth dataset was prepared by python tools/dataset_converters/prepare_dataset.py mjsynth --task textrecog --lmdb
.
The config file for the dataset is changed to:
train_pipeline = [
dict(type='LoadImageFromNDArray', ignore_empty=True, min_size=2),
dict(type='LoadOCRAnnotations', with_text=True),
dict(
type='PackTextRecogInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape', 'valid_ratio'))
]
Reproduces the problem - command or script
python tools/train.py configs/textrecog/abinet/abinet-vision_20e_st-an_mj.py
Reproduces the problem - error message
05/25 19:17:13 - mmengine - INFO - Epoch(train) [1][ 1800/278728] lr: 4.2239e-07 eta: 17 days, 8:17:57 time: 0.2353 data_time: 0.0608 memory: 817 loss: 11.4207 loss_ctc: 11.4207
05/25 19:17:35 - mmengine - INFO - Epoch(train) [1][ 1900/278728] lr: 4.4031e-07 eta: 17 days, 4:43:53 time: 0.2671 data_time: 0.0933 memory: 817 loss: 11.5051 loss_ctc: 11.5051
Traceback (most recent call last):
File "/gpfs/projects/ZhuGroup/dev1/tools/train.py", line 114, in <module>
main()
File "/gpfs/projects/ZhuGroup/dev1/tools/train.py", line 110, in main
runner.train()
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1721, in train
model = self.train_loop.run() # type: ignore
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/mmengine/runner/loops.py", line 96, in run
self.run_epoch()
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/mmengine/runner/loops.py", line 111, in run_epoch
for idx, data_batch in enumerate(self.dataloader):
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 634, in __next__
data = self._next_data()
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1326, in _next_data
return self._process_data(data)
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
raise exception
AttributeError: Caught AttributeError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/mmengine/dataset/dataset_wrapper.py", line 159, in __getitem__
return self.datasets[dataset_idx][sample_idx]
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/mmengine/dataset/base_dataset.py", line 413, in __getitem__
data = self.prepare_data(idx)
File "/gpfs/projects/ZhuGroup/dev1/mmocr/datasets/recog_lmdb_dataset.py", line 149, in prepare_data
return self.pipeline(data_info)
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/mmengine/dataset/base_dataset.py", line 59, in __call__
data = t(data)
File "/gpfs/projects/ZhuGroup/miniconda3/envs/dev1/lib/python3.10/site-packages/mmcv/transforms/base.py", line 12, in __call__
return self.transform(results)
File "/gpfs/projects/ZhuGroup/dev1/mmocr/datasets/transforms/loading.py", line 190, in transform
results['img_shape'] = img.shape[:2]
AttributeError: 'NoneType' object has no attribute 'shape'
Additional information
I'm not sure if #1896 is the cause of this issue. It looks like it is due to an empty image. But given ignore_empty=True
is set, why did this still happen?
The prepared lmdb dataset can be downloaded directly from google drive.
It looks like the error is caused by some broken images from mjsynth dataset. A temporary solution can be directly downloading generated lmdb files from other people. For example, it is available at https://aistudio.baidu.com/aistudio/datasetdetail/114635.
Hi @Jiayou-Chao One possible solution is to modify the loading function like this https://github.com/open-mmlab/mmocr/blob/d7c59f3325aaf4cbf6ddd3ec69f03230bc582d19/mmocr/datasets/transforms/loading.py#L182 -->
img = results['img']
if img is None:
return None