LoGoNet icon indicating copy to clipboard operation
LoGoNet copied to clipboard

Errors for processing waymo infos:

Open Z-Lee-corder opened this issue 2 years ago • 14 comments

Hello, I would like to run the code on waymo dataset. However, when I run the following two commands

  1. "python -m al3d_det.datasets.waymo.waymo_preprocess --cfg_file tools/cfgs/det_dataset_cfgs/waymo_xxx_sweeps_mm.yaml --func create_waymo_infos",
  2. python -m al3d_det.datasets.waymo.waymo_preprocess --cfg_file tools/cfgs/det_dataset_cfgs/waymo_xxxx_sweeps_mm.yaml --func create_waymo_database

The following error occurred: 2023-09-07 21:03:29.162028: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 Traceback (most recent call last): File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/media/lizheng/Samsung/codes/LoGoNet/detection/al3d_det/datasets/waymo/waymo_preprocess.py", line 355, in create_waymo_database( File "/media/lizheng/Samsung/codes/LoGoNet/detection/al3d_det/datasets/waymo/waymo_preprocess.py", line 304, in create_waymo_database dataset = WaymoTrainingDataset( File "/media/lizheng/Samsung/codes/LoGoNet/detection/al3d_det/datasets/waymo/waymo_dataset.py", line 51, in init from petrel_client.client import Client ModuleNotFoundError: No module named 'petrel_client'

When I remove "OSS_PATH: 'cluster2:s3://dataset/waymo" in "waymo_one_sweep_mm.yaml", the new error occurred: Traceback (most recent call last): File "/media/lizheng/Samsung/codes/LoGoNet/detection/al3d_det/datasets/waymo/waymo_preprocess.py", line 38, in get_infos_worker sequence_infos = list(tqdm(executor.map(process_single_sequence, sample_sequence_file_list), File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/tqdm/std.py", line 1182, in iter for obj in iterable: File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/concurrent/futures/_base.py", line 437, in result return self.__get_result() File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise self._exception File "/home/lizheng/anaconda3/envs/open-mmlab/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/media/lizheng/Samsung/codes/LoGoNet/detection/al3d_det/datasets/waymo/waymo_utils.py", line 218, in process_single_sequence_and_save if pkl_file.exists(): AttributeError: 'str' object has no attribute 'exists'

May I ask what I should do?

Z-Lee-corder avatar Sep 07 '23 13:09 Z-Lee-corder

Replace if pkl_file.exists(): by if os.path.exists(pkl_file) Later on you might also have to comment the line info_path = self.check_sequence_name_with_all_version(info_path) and replace sequence_file_tfrecord = sequence_file[:-9] + '_with_camera_labels.tfrecord by sequence_file_tfrecord = sequence_file[:-9] + '.tfrecord'. Those might only be true if you are not using ceph.

CSautier avatar Sep 12 '23 12:09 CSautier

Replace if pkl_file.exists(): by if os.path.exists(pkl_file) Later on you might also have to comment the line info_path = self.check_sequence_name_with_all_version(info_path) and replace sequence_file_tfrecord = sequence_file[:-9] + '_with_camera_labels.tfrecord by sequence_file_tfrecord = sequence_file[:-9] + '.tfrecord'. Those might only be true if you are not using ceph.

Thank you for your reply. After following your modifications, the error has been eliminated. But when I run the command "python -m al3d_det.datasets.waymo.waymo_preprocess --cfg_file tools/cfgs/det_dataset_cfgs/waymo_one_sweep_mm.yaml --func create_waymo_infos", the CPU memory (64GB) is not enough.

May I know how to operate the codes properly?

Z-Lee-corder avatar Sep 13 '23 04:09 Z-Lee-corder

Replace if pkl_file.exists(): by if os.path.exists(pkl_file) Later on you might also have to comment the line info_path = self.check_sequence_name_with_all_version(info_path) and replace sequence_file_tfrecord = sequence_file[:-9] + '_with_camera_labels.tfrecord by sequence_file_tfrecord = sequence_file[:-9] + '.tfrecord'. Those might only be true if you are not using ceph.

Previously, I processed waymo data in the official project of “OpenPCDet”. However, the processed waymo data does not contain images infomation. Is it not possible for me to use the these generated data files in this project (LoGoNet) now?

Z-Lee-corder avatar Sep 13 '23 04:09 Z-Lee-corder

Yes, I've found too that the waymo preprocessing cost a lot of memory. A partial solution would be to remove the multiprocessing by replacing

with futures.ThreadPoolExecutor(num_workers) as executor:
    sequence_infos = list(tqdm(executor.map(process_single_sequence, sample_sequence_file_list),
                               total=len(sample_sequence_file_list)))

by sequence_infos = list([process_single_sequence(sample_sequence_file) for sample_sequence_file in tqdm(sample_sequence_file_list)]) however you have to know that this makes the process even slower.

Also if at some point you get it running, make absolutely sure it does save png files, as for me it didn't at first. You can for instance replace in waymo_utils the line cv2.imwrite(image_path, all_images[cam_i]) by

if not cv2.imwrite(image_path, all_images[cam_i]):
    os.makedirs(os.path.join(cur_save_dir, 'image_{}'.format(cam_i)), exist_ok=True)
    cv2.imwrite(image_path, all_images[cam_i])

CSautier avatar Sep 13 '23 08:09 CSautier

As for using the OpenPCDet preprocessing I have no idea. I'm not affiliated with the authors of the code, I'm just trying to get the code running as well.

CSautier avatar Sep 13 '23 08:09 CSautier

Yes, I've found too that the waymo preprocessing cost a lot of memory. A partial solution would be to remove the multiprocessing by replacing

with futures.ThreadPoolExecutor(num_workers) as executor:
    sequence_infos = list(tqdm(executor.map(process_single_sequence, sample_sequence_file_list),
                               total=len(sample_sequence_file_list)))

by sequence_infos = list([process_single_sequence(sample_sequence_file) for sample_sequence_file in tqdm(sample_sequence_file_list)]) however you have to know that this makes the process even slower.

Also if at some point you get it running, make absolutely sure it does save png files, as for me it didn't at first. You can for instance replace in waymo_utils the line cv2.imwrite(image_path, all_images[cam_i]) by

if not cv2.imwrite(image_path, all_images[cam_i]):
    os.makedirs(os.path.join(cur_save_dir, 'image_{}'.format(cam_i)), exist_ok=True)
    cv2.imwrite(image_path, all_images[cam_i])

Thank you very much for your patient answer. With your help, I can now process the data normally. But I found that the processed data is very large. Is it necessary to have at least 5T of storage space, as I have found that each frame scene has an additional 6 camera images added. At present, my storage capacity is only 3T, which is probably not enough.

Z-Lee-corder avatar Sep 14 '23 11:09 Z-Lee-corder

I can't tell for sure, but the waymo_one_sweep_mm.yaml seems to use a bit less than 3T. Maybe start with KITTI as it seems to be much lighter and probably easier to setup.

CSautier avatar Sep 14 '23 12:09 CSautier

I can't tell for sure, but the waymo_one_sweep_mm.yaml seems to use a bit less than 3T. Maybe start with KITTI as it seems to be much lighter and probably easier to setup.

hello, could you please tell me how long of the porcessing on waymo infos,? The program output log stayed in this interface for a long time

image

and the gpu memory i used is shown as below:

image

SISTMrL avatar Sep 15 '23 08:09 SISTMrL

The pre-processing last about 150 hours on my hardware, with no multiprocessing. I'm not sure why it uses any GPU memory, as far as I can tell the pre-processing is CPU-only. It seems to me that it opens all sequences, parses it, converts the range-view into point clouds, and saves individually the point cloud, images and annotations.

CSautier avatar Sep 18 '23 08:09 CSautier

@CSautier thanks bro,i got trouble for this for a whole week. appreciate for your attribution!

reynerliu avatar Mar 12 '24 08:03 reynerliu

assert img_file.exists() AttributeError: 'NoneType' object has no attribute 'exists'

run ‘python -m al3d_det.datasets.waymo.waymo_preprocess --cfg_file tools/cfgs/det_dataset_cfgs/waymo_xxx_sweeps_mm.yaml --func create_waymo_infos' and 'python -m al3d_det.datasets.waymo.waymo_preprocess --cfg_file tools/cfgs/det_dataset_cfgs/waymo_xxxx_sweeps_mm.yaml --func create_waymo_database'. There are not the path of '../data/waymo/waymo_processed_data_v4/segment-9509506420470671704_4049_100_4069_100_with_camera_labels/image_0/0034.png' and '../data/waymo/waymo_processed_data_v4/segment-9509506420470671704_4049_100_4069_100_with_camera_labels/image*' What should I do? Thake you for your answer. @CSautier There lack of image file of waymo data set.

SiHengHeHSH avatar Mar 15 '24 07:03 SiHengHeHSH

Yes, I've found too that the waymo preprocessing cost a lot of memory. A partial solution would be to remove the multiprocessing by replacing

with futures.ThreadPoolExecutor(num_workers) as executor:
    sequence_infos = list(tqdm(executor.map(process_single_sequence, sample_sequence_file_list),
                               total=len(sample_sequence_file_list)))

by sequence_infos = list([process_single_sequence(sample_sequence_file) for sample_sequence_file in tqdm(sample_sequence_file_list)]) however you have to know that this makes the process even slower.

Also if at some point you get it running, make absolutely sure it does save png files, as for me it didn't at first. You can for instance replace in waymo_utils the line cv2.imwrite(image_path, all_images[cam_i]) by

if not cv2.imwrite(image_path, all_images[cam_i]):
    os.makedirs(os.path.join(cur_save_dir, 'image_{}'.format(cam_i)), exist_ok=True)
    cv2.imwrite(image_path, all_images[cam_i])

Hello, I want to ask, why did my kitti data set report the following errors during training. Traceback (most recent call last): File "detection/tools/train.py", line 204, in main() File "detection/tools/train.py", line 153, in main last_epoch=last_epoch, optim_cfg=cfg.OPTIMIZATION File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/init.py", line 52, in build_scheduler optimizer, total_steps, last_step, optim_cfg.LR, list(optim_cfg.MOMS), optim_cfg.DIV_FACTOR, optim_cfg.PCT_START File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 85, in init super().init(fai_optimizer, total_step, last_step, lr_phases, mom_phases) File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 45, in init self.step() File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 58, in step self.update_lr() File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 51, in update_lr self.optimizer.lr = func((step - start) / (end - start)) ZeroDivisionError: division by zero ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3213650) of binary: /home/linux/anaconda3/envs/logonet/bin/python Traceback (most recent call last): File "/home/linux/anaconda3/envs/logonet/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in main() File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run )(*cmd_args) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

kikiki-cloud avatar May 24 '24 08:05 kikiki-cloud

Yes, I've found too that the waymo preprocessing cost a lot of memory. A partial solution would be to remove the multiprocessing by replacing

with futures.ThreadPoolExecutor(num_workers) as executor:
    sequence_infos = list(tqdm(executor.map(process_single_sequence, sample_sequence_file_list),
                               total=len(sample_sequence_file_list)))

by sequence_infos = list([process_single_sequence(sample_sequence_file) for sample_sequence_file in tqdm(sample_sequence_file_list)]) however you have to know that this makes the process even slower. Also if at some point you get it running, make absolutely sure it does save png files, as for me it didn't at first. You can for instance replace in waymo_utils the line cv2.imwrite(image_path, all_images[cam_i]) by

if not cv2.imwrite(image_path, all_images[cam_i]):
    os.makedirs(os.path.join(cur_save_dir, 'image_{}'.format(cam_i)), exist_ok=True)
    cv2.imwrite(image_path, all_images[cam_i])

Hello, I want to ask, why did my kitti data set report the following errors during training. Traceback (most recent call last): File "detection/tools/train.py", line 204, in main() File "detection/tools/train.py", line 153, in main last_epoch=last_epoch, optim_cfg=cfg.OPTIMIZATION File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/init.py", line 52, in build_scheduler optimizer, total_steps, last_step, optim_cfg.LR, list(optim_cfg.MOMS), optim_cfg.DIV_FACTOR, optim_cfg.PCT_START File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 85, in init super().init(fai_optimizer, total_step, last_step, lr_phases, mom_phases) File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 45, in init self.step() File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 58, in step self.update_lr() File "/home/linux/guorong/qinhao/LoGoNet/utils/al3d_utils/optimize_utils/learning_schedules_fastai.py", line 51, in update_lr self.optimizer.lr = func((step - start) / (end - start)) ZeroDivisionError: division by zero ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3213650) of binary: /home/linux/anaconda3/envs/logonet/bin/python Traceback (most recent call last): File "/home/linux/anaconda3/envs/logonet/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in main() File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run )(*cmd_args) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/linux/anaconda3/envs/logonet/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

hello Have you solve your question?I met the same question.

fangweicheng6 avatar Jul 04 '24 03:07 fangweicheng6