mmdetection3d
mmdetection3d copied to clipboard
Fix waymo converter to save img in .jpg(offical in waymo open) reduce dataset from 3.3T to 1.1T
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
When I look into waymo tfreocrd, I found that the actual format of waymo open dataset is .jpg
not .png
.
This can be confirmed by the binary stream in tfrecord files. (The image's binary stream starts with FF D8 FF E0 00 10 …
, which is exactly the standard jpg data.)
Goal: So to save storage space, I want to change the image saving format from .png
to .jpg
.
Modification
I modifed the save_image
function in class Waymo2KITTI
in tools/data_converter/waymo_converter.py
To make the generated ***.pkl to remember the image's file_tail is .jpg
. I modified the get_image_path
function in tools/data_converter/kitti_data_utils.py
, add a file_tail(default='.png') param for it to make the .jpg
file_tail be able to be passed in. Besides, this willnot disturb the usage for KITTI processing.
BC-breaking (Optional)
I have test the generation of kittiformat Waymo. It works well.
I have test these modifications will not disturb the KITTI dataset generation.(continue to save .png
)
Use cases (Optional)
Nothing new. It just works when you convert dataset of Waymo.
Checklist
- Pre-commit or other linting tools are used to fix the potential lint issues.(Yes)
- The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.(Yes)
- If the modification has potential influence on downstream projects, this PR should be tested with downstream projects.(No influence)
- The documentation has been modified accordingly, like docstring or example tutorials.(Nothing need to change.)
The storge saving is very effective.
For example, in training
dir
Before:
% du -h -d 1
1.1G ./label_all
781M ./calib
767G ./velodyne
781M ./pose
781M ./timestamp
855M ./label_0
574M ./label_1
485M ./label_2
504M ./label_3
410M ./label_4
568G ./image_0
579G ./image_1
592G ./image_2
392G ./image_3
400G ./image_4
3.3T .
After:
% du -h -d 1
1.1G ./label_all
781M ./calib
767G ./velodyne
781M ./pose
781M ./timestamp
855M ./label_0
574M ./label_1
485M ./label_2
504M ./label_3
410M ./label_4
76G ./image_0
78G ./image_1
79G ./image_2
53G ./image_3
54G ./image_4
1.1T .
I have also test the saved images end with .jpg
. There is no difference with the .png
-saved images.
Tips: Better read jpg with cv2.
N O T !!! PIL.Image !!!!
The JPEG read by PIL.Image is slightly different with the lossless PNG image.
This is caused by something like dct_method='INTEGER_ACCURATE'
.
Hi @zzj403, thanks for your contribution! Could you please use pre-commit hook following this guide to fix the linting errors?
Hi @zzj403, thanks for your contribution! Could you please use pre-commit hook following this guide to fix the linting errors?
Thanks! Fixed. What's the next thing should I do?
Please rebase your modification from master to dev because we usually merge PR into dev branch first to stabilize the master branch. You may also need to resolve the conflict if necessary.
Done. But I comment id: check-algo-readme
in .pre-commit-config.yaml
for it outputs strange BUG to let me fix readme in mmdetection.
Hi @zzj403 , Thanks for your kind PR. I find that the git commit history is a bit messy. It seems that the PR is checkout from master and then attempted to be merged into dev. Would you like to checkout a new branch from dev, then add your modification in that branch and create a new PR? It is a bit hard to review the code. Sorry for the inconvenience.
@ZwwWayne Thanks for your kind advice! Sorry for your extra work. I'm just learning how to PR. I will do these following your advice ASAP.
Hi, @ZwwWayne ,
I open a new PR https://github.com/open-mmlab/mmdetection3d/pull/1759 to change in dev
branch
PR https://github.com/open-mmlab/mmdetection3d/pull/1759 is merged. Close this PR.