KPConv-PyTorch icon indicating copy to clipboard operation
KPConv-PyTorch copied to clipboard

Similar dataset but without calib.txt and times.txt files.

Open JohanBergius opened this issue 2 years ago • 18 comments

Hi THOMAS, I'm using a similar dataset, but I don't have the calib.txt and times.txt files. What exactly do these provide to your implementation?

JohanBergius avatar Apr 01 '22 08:04 JohanBergius

Also, I have an error: class_potentials torch.Size([0]) gen_indices [] gen_classes [] Traceback (most recent call last): File "train_SemanticKitti.py", line 271, in training_sampler.calib_max_in(config, training_loader, verbose=True) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/datasets/SemanticKitti.py", line 927, in calib_max_in for batch_i, batch in enumerate(dataloader): File "/home/stud/j/johanb17/.conda/envs/kpconv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in iter return _MultiProcessingDataLoaderIter(self) File "/home/stud/j/johanb17/.conda/envs/kpconv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 746, in init self._try_put_index() File "/home/stud/j/johanb17/.conda/envs/kpconv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 861, in _try_put_index index = self._next_index() File "/home/stud/j/johanb17/.conda/envs/kpconv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 339, in _next_index return next(self._sampler_iter) # may raise StopIteration File "/home/stud/j/johanb17/.conda/envs/kpconv/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 200, in iter for idx in self.sampler: File "/home/stud2/j/johanb17/KPConv-PyTorch-master/datasets/SemanticKitti.py", line 805, in iter gen_indices = torch.cat(gen_indices, dim=0) RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CUDATensorId, CPUTensorId, VariableTensorId]

JohanBergius avatar Apr 02 '22 23:04 JohanBergius

Hi Johan,

Thanks for your interest in the code. Here are some answers to your questions.

I don't have the calib.txt

For each sequence, the calibration file provide the transformation matrix between the lidar frame and the base frame. We use it to get the lidar poses in world coordinates (=transformations from lidar coordinates to world coordinates) here: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/e600c1667d085aeb5cf89d8dbe5a97aad4270d88/datasets/SemanticKitti.py#L700-L714

You don't especially need calibration, but it is common to have it in such datasets. The only thing you need ultimately is the lidar poses in world coordinates.

times.txt

These files provide the timestamps for each lidar frame. You need this but it does not have to be under the form of a times.txt file. Any format will do as long as you write the code that loads them in the list called self.times like here: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/e600c1667d085aeb5cf89d8dbe5a97aad4270d88/datasets/SemanticKitti.py#L563

Also, I have an error: class_potentials torch.Size([0]) gen_indices [] gen_classes [] Traceback (most recent call last):

It seems that the potential array is empty. You can investigate why here:

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/e600c1667d085aeb5cf89d8dbe5a97aad4270d88/datasets/SemanticKitti.py#L159-L161

I would suggest looking into the creation of the array called self.all_inds in the function

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/e600c1667d085aeb5cf89d8dbe5a97aad4270d88/datasets/SemanticKitti.py#L542

This array of share [2, N] contains the indices of all the dataset frames (sequence index and frame index in the sequence), and is crucial in many parts of the code

HuguesTHOMAS avatar Apr 04 '22 21:04 HuguesTHOMAS

Thanks. However, I still don't fully understand how to skip "times.txt" and "calib.txt" without getting an error! Should I replace it with a dummy (like the semanticKITTI data?)

JohanBergius avatar Apr 12 '22 07:04 JohanBergius

Well, you will have to understand and modify the code.

If your poses are already the ones of the lidar, then you can just replace this line: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/e600c1667d085aeb5cf89d8dbe5a97aad4270d88/datasets/SemanticKitti.py#L559-L560

with something like

self.calibrations.append({'Tr': np.eye(4)})

So that the calibration are just identity matrices

Concerning times.txt, they are not needed actually so you can just comment the line https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/e600c1667d085aeb5cf89d8dbe5a97aad4270d88/datasets/SemanticKitti.py#L562-L563

HuguesTHOMAS avatar Apr 12 '22 13:04 HuguesTHOMAS

Thanks! Sadly I have a new error:

Traceback (most recent call last): File "train_SemanticKitti.py", line 311, in trainer.train(net, training_loader, test_loader, config) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/utils/trainer.py", line 167, in train for batch in training_loader: File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 384, in _next_data index = self._next_index() # may raise StopIteration File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 339, in _next_index return next(self._sampler_iter) # may raise StopIteration File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 200, in iter for idx in self.sampler: File "/home/stud2/j/johanb17/KPConv-PyTorch-master/datasets/SemanticKitti.py", line 796, in iter class_indices = self.dataset.class_frames[i][class_indices] IndexError: tensors used as indices must be long, byte or bool tensors

JohanBergius avatar Apr 20 '22 11:04 JohanBergius

By turning it into a float() i instead get:

Traceback (most recent call last): File "train_SemanticKitti.py", line 311, in trainer.train(net, training_loader, test_loader, config) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/utils/trainer.py", line 167, in train for batch in training_loader: File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 384, in _next_data index = self._next_index() # may raise StopIteration File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 339, in _next_index return next(self._sampler_iter) # may raise StopIteration File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 200, in iter for idx in self.sampler: File "/home/stud2/j/johanb17/KPConv-PyTorch-master/datasets/SemanticKitti.py", line 824, in iter self.dataset.epoch_inds += gen_indices RuntimeError: The size of tensor a (4400) must match the size of tensor b (2940) at non-singleton dimension 0

JohanBergius avatar Apr 20 '22 11:04 JohanBergius

for the first error:

File "/home/stud2/j/johanb17/KPConv-PyTorch-master/datasets/SemanticKitti.py", line 796, in iter class_indices = self.dataset.class_frames[i][class_indices] IndexError: tensors used as indices must be long, byte or bool tensors

can you try to add

class_indices = class_indices.type(torch.int64)

just before

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3a774ff8d54a4d080fe65093b2299ede35d9735d/datasets/SemanticKitti.py#L778

HuguesTHOMAS avatar Apr 20 '22 13:04 HuguesTHOMAS

For the second error, it seems you do not have enough frames in your whole dataset to populate one epoch. Can you try to change the configuration parameter: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3a774ff8d54a4d080fe65093b2299ede35d9735d/train_SemanticKitti.py#L162-L163

to a lower value like 250 for example?

HuguesTHOMAS avatar Apr 20 '22 13:04 HuguesTHOMAS

Thanks, before testing your suggestion I tried to add .long() to class_indices i was able to run "class_indices = self.dataset.class_frames[i][class_indices.long()]": I also tested to remove unused labels from the dataset-kitti.yaml file and was able to start training ... but a new error occurred! (The larger value of epoch_steps the longer it trains ... but it still ends with the same error) [CONSOLE] e000-i0124 => L=0.000 acc= 94% / t(ms): 18.8 35.4 36.8) e000-i0136 => L=0.000 acc= 92% / t(ms): 15.3 34.7 39.9) e000-i0148 => L=0.000 acc= 84% / t(ms): 19.6 36.2 38.2) e000-i0160 => L=0.000 acc= 89% / t(ms): 25.6 33.8 38.8) e000-i0172 => L=0.000 acc= 83% / t(ms): 12.2 33.1 41.1) e000-i0183 => L=0.000 acc= 87% / t(ms): 20.0 35.3 41.7) e000-i0195 => L=0.000 acc= 97% / t(ms): 11.7 33.5 42.2) e000-i0207 => L=0.000 acc= 61% / t(ms): 12.3 34.3 40.7) e000-i0218 => L=0.000 acc= 99% / t(ms): 17.7 35.4 39.4) e000-i0232 => L=0.000 acc= 34% / t(ms): 8.6 31.2 35.7) e000-i0243 => L=0.000 acc= 99% / t(ms): 23.9 30.6 36.4) e000-i0255 => L=0.000 acc= 88% / t(ms): 18.2 36.5 37.2) e000-i0266 => L=0.000 acc= 87% / t(ms): 16.9 38.2 38.5) e000-i0279 => L=0.000 acc= 75% / t(ms): 8.2 35.6 36.0) e000-i0292 => L=0.000 acc= 98% / t(ms): 4.9 36.7 38.6) e000-i0306 => L=0.000 acc= 62% / t(ms): 5.2 33.7 36.8) e000-i0320 => L=0.000 acc=100% / t(ms): 3.8 32.6 37.3) e000-i0332 => L=0.000 acc= 79% / t(ms): 5.2 38.6 38.1) e000-i0346 => L=0.000 acc= 15% / t(ms): 4.1 38.9 35.1) e000-i0359 => L=0.000 acc= 96% / t(ms): 12.7 34.7 36.0) e000-i0372 => L=0.000 acc= 93% / t(ms): 5.9 38.6 36.7) e000-i0385 => L=0.000 acc= 21% / t(ms): 5.6 38.2 34.3) e000-i0398 => L=0.000 acc= 80% / t(ms): 4.3 38.6 36.7) e000-i0411 => L=0.000 acc= 79% / t(ms): 4.1 37.6 36.9) e000-i0425 => L=0.000 acc= 76% / t(ms): 3.5 34.3 36.8) e000-i0439 => L=0.000 acc= 75% / t(ms): 4.9 31.9 34.7) e000-i0452 => L=0.000 acc= 44% / t(ms): 4.1 33.6 38.4) e000-i0466 => L=0.000 acc= 97% / t(ms): 3.6 36.4 34.3) e000-i0480 => L=0.000 acc= 77% / t(ms): 3.7 34.3 36.3) e000-i0493 => L=0.000 acc= 99% / t(ms): 3.2 35.1 39.1) Traceback (most recent call last): File "train_SemanticKitti.py", line 311, in trainer.train(net, training_loader, test_loader, config) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/utils/trainer.py", line 274 , in train self.validation(net, val_loader, config) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/utils/trainer.py", line 292 , in validation self.slam_segmentation_validation(net, val_loader, config) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/utils/trainer.py", line 703 , in slam_segmentation_validation for i, batch in enumerate(val_loader): File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torc h/utils/data/dataloader.py", line 279, in iter return _MultiProcessingDataLoaderIter(self) File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torc h/utils/data/dataloader.py", line 746, in init self._try_put_index() File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torc h/utils/data/dataloader.py", line 861, in _try_put_index index = self._next_index() File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torc h/utils/data/dataloader.py", line 339, in _next_index return next(self._sampler_iter) # may raise StopIteration File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torc h/utils/data/sampler.py", line 200, in iter for idx in self.sampler: File "/home/stud2/j/johanb17/KPConv-PyTorch-master/datasets/SemanticKitti.py", line 848, in iter self.dataset.epoch_inds += gen_indices RuntimeError: The size of tensor a (1761) must match the size of tensor b (102) at non-singleton dimension 0

JohanBergius avatar Apr 20 '22 15:04 JohanBergius

This other error is the same as the previous one but on the validation set instead of the training set. It seems you do not have many validation frames in your dataset.

The most simple solution is to reduce the validation size: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3a774ff8d54a4d080fe65093b2299ede35d9735d/train_SemanticKitti.py#L165-L166

try

validation_size = 10

HuguesTHOMAS avatar Apr 20 '22 16:04 HuguesTHOMAS

Thank you very much, Thomas! It seems to be running now :) Would you by any chance know why the loss is always zero? Is it something you have seen before or could it simply be the dataset I'm using?!

e001-i0607 => L=0.000 acc= 82% / t(ms): 38.8 35.4 38.7) e001-i0617 => L=0.000 acc= 89% / t(ms): 41.1 32.9 34.9) e001-i0627 => L=0.000 acc= 95% / t(ms): 28.7 30.7 38.0) e001-i0638 => L=0.000 acc= 95% / t(ms): 36.4 27.8 35.2) e001-i0646 => L=0.000 acc= 76% / t(ms): 39.0 31.8 41.7) e001-i0654 => L=0.000 acc= 83% / t(ms): 36.2 36.1 43.4) e001-i0667 => L=0.000 acc= 83% / t(ms): 19.9 32.7 38.6) e001-i0676 => L=0.000 acc= 90% / t(ms): 35.4 32.9 40.7) e001-i0686 => L=0.000 acc= 85% / t(ms): 42.0 34.8 40.2) e001-i0699 => L=0.000 acc= 96% / t(ms): 14.9 34.4 40.9) e001-i0711 => L=0.000 acc= 52% / t(ms): 14.3 33.9 35.7) e001-i0722 => L=0.000 acc= 91% / t(ms): 18.4 35.4 38.7) e001-i0732 => L=0.000 acc= 90% / t(ms): 20.1 33.3 44.1) e001-i0744 => L=0.000 acc= 89% / t(ms): 18.2 33.9 39.9) e001-i0756 => L=0.000 acc= 87% / t(ms): 21.8 31.9 37.8) e001-i0765 => L=0.000 acc= 46% / t(ms): 37.9 32.9 38.3) e001-i0774 => L=0.000 acc= 67% / t(ms): 44.0 30.3 37.2) e001-i0784 => L=0.000 acc= 85% / t(ms): 34.0 34.1 41.2) e001-i0796 => L=0.000 acc= 60% / t(ms): 15.4 34.0 38.9) e001-i0804 => L=0.000 acc= 93% / t(ms): 43.2 34.2 37.8) e001-i0816 => L=0.000 acc= 57% / t(ms): 21.1 34.6 38.6) e001-i0828 => L=0.000 acc= 92% / t(ms): 18.7 33.9 39.7)

JohanBergius avatar Apr 20 '22 16:04 JohanBergius

After training it for 6 epochs I encountered another error: ###https://github.com/HuguesTHOMAS/KPConv-PyTorch/issues/129#issuecomment-963215105

Traceback (most recent call last): File "train_SemanticKitti.py", line 311, in trainer.train(net, training_loader, test_loader, config) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/utils/trainer.py", line 188, in train outputs = net(batch, config) File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/models/architectures.py", line 336, in forward x = block_op(x, batch) File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/models/blocks.py", line 636, in forward x = self.leaky_relu(self.batch_norm_conv(x)) File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/models/blocks.py", line 457, in forward x = self.batch_norm(x) File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 107, in forward exponential_average_factor, self.eps) File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/nn/functional.py", line 1666, in batch_norm raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size)) ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 128, 1])

JohanBergius avatar Apr 20 '22 17:04 JohanBergius

There is a problem with your loss, it should not be zero. Are you sure you are loading the frame labels well?

For the error, it is probably because you have a point cloud with only 1 point in it. You should verify the points and labels you are loading.

HuguesTHOMAS avatar Apr 20 '22 19:04 HuguesTHOMAS

Thanks a lot! A follow-up questions:

  1. What do "checkpoints" from the logs include?
  2. Where is the model saved?

JohanBergius avatar Apr 20 '22 20:04 JohanBergius

Hi, again. I still have a problem with your testing script:

Traceback (most recent call last): File "test_models.py", line 227, in tester.slam_segmentation_test(net, test_loader, config) File "/home/stud2/j/johanb17/KPConv-PyTorch-master/utils/tester.py", line 527, in slam_segmentation_test for i, batch in enumerate(test_loader): File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/stud/j/johanb17/.conda/envs/KPmlp/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/stud2/j/johanb17/KPConv-PyTorch-master/datasets/SemanticKitti.py", line 245, in getitem ind = int(self.epoch_inds[self.epoch_i]) IndexError: index 397 is out of bounds for dimension 0 with size 397

JohanBergius avatar Apr 26 '22 08:04 JohanBergius

What do "checkpoints" from the logs include?

The checkpoints are saved models at different epochs that you can use to compare performances at different points in the training or to restore previous states of the model.

Where is the model saved?

In checkpoints :). You can use the latest one for your test as it will be the one that has fully converged.

IndexError: index 397 is out of bounds for dimension 0 with size 397

it is probably a dumb mistake where the tensor self.epoch_inds is not long enough. I did not anticipate every possibility sometimes in my code. The most simple fix is to go where self.epoch_inds is created and make it longer. Or ensure the indices generated by the sampler are not higher than self.epoch_inds.shape[0].

HuguesTHOMAS avatar May 05 '22 20:05 HuguesTHOMAS

Hi THOMAS, Hi Johan!

Thank you for you guys wonderful discussion! This Issue help me a lot to understand this nice wok. :)

I am currently trying to train KPConv on my own Dataset which is in "kitti-format".

According to THOMAS suggestion: "Or ensure the indices generated by the sampler are not higher than self.epoch_inds.shape[0]." I have try the way like this:

image

but I still got a problem like:

Traceback (most recent call last): File "test_models.py", line 217, in tester.slam_segmentation_test(net, test_loader, config) File "/home/jovyan/project/zhoujianbin/KPConv-PyTorch/utils/tester.py", line 527, in slam_segmentation_test for i, batch in enumerate(test_loader): File "/home/jovyan/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/home/jovyan/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1065, in _next_data return self._process_data(data) File "/home/jovyan/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/home/jovyan/.local/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) IndexError: Caught IndexError in DataLoader worker process 4. Original Traceback (most recent call last): File "/home/jovyan/.local/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/home/jovyan/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jovyan/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jovyan/project/zhoujianbin/KPConv-PyTorch/datasets/SemanticKitti.py", line 270, in getitem ind = int(self.epoch_inds[self.epoch_i]) IndexError: index 71 is out of bounds for dimension 0 with size 71

It seems that the problem has not been solved. I've made the following changes to the original code:

  1. "config.validation_size = 8" in test_model.py;
  2. "gen_indices = torch.from_numpy(np.arange(num_centers))" INSTEAD OF " _, gen_indices = torch.topk(self.dataset.potentials, num_centers, largest=False, sorted=True)" in SemanticKitti.py;

I want to know that is it correct for me to change like this to ensure the indices generated by the sampler are not higher than self.epoch_inds.shape[0]? If not, could you please show me a correct way to achieve this goal? Thanks again and have a good day :)

Trexzhou avatar Jul 05 '22 07:07 Trexzhou

Hi @Trexzhou,

I just updated a correction like the one I had to make for S3DIS before, where I loop back the epoch finds if they end up higher than the shape of epoch_inds:

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/7d4c03d1996d31bf3330d6652ce3604afbe5345b/datasets/SemanticKitti.py#L231-L234

This should solve this bug for good and you can remove your own modifications.

HuguesTHOMAS avatar Jul 11 '22 14:07 HuguesTHOMAS