FSCT icon indicating copy to clipboard operation
FSCT copied to clipboard

Error "default" training

Open stefp opened this issue 2 years ago • 5 comments

Hi Sean, first of all kudos for publishing this repo!

I cloned the latest version and I am trying to run the train.py on the "default" training based on the test_data.las located in the training, test, and validation folders.

I get the following error:

 initialization failed, you might not have a CUDA gpu. (Triggered internally at  ..\c10\cuda\CUDAFunctions.cpp:115.)
  return torch._C._cuda_getDeviceCount() > 0
Preprocessing train_dataset point clouds...
Preprocessing test_dataset point clouds...
Preprocessing validation_dataset point clouds...
Traceback (most recent call last):
  File "FSCT/scripts/train.py", line 334, in <module>
    run_training = TrainModel(parameters)
  File "FSCT/scripts/train.py", line 34, in __init__
    self.train_loader = DataLoader(
  File "C:\Users\stpu\Anaconda3\envs\fsct\lib\site-packages\torch_geometric\data\dataloader.py", line 65, in __init__
    super(DataLoader,
  File "C:\Users\stpu\Anaconda3\envs\fsct\lib\site-packages\torch\utils\data\dataloader.py", line 270, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore[arg-type]
  File "C:\Users\stpu\Anaconda3\envs\fsct\lib\site-packages\torch\utils\data\sampler.py", line 102, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

Any idea what might be going wrong?

Thanks

stefp avatar May 07 '22 20:05 stefp

Hi Stef,

Thanks for that and I'm always happy to see it getting used. I've just fixed up the training code, so it should hopefully work for you now. Let me know how you go. You can also try using cpu mode if you continue to have CUDA issues. CUDA error messages can be rather cryptic.

Cheers, Sean

SKrisanski avatar May 08 '22 02:05 SKrisanski

Hi Sean,

thanks for the fast asnwer. I reclone the repo and tried again but unfortunately it give this error:

''' (fsct) C:\Users\stpu\FSCT\scripts>python train.py C:\Users\stpu\Anaconda3\envs\fsct\lib\site-packages\torch\cuda_init_.py:52: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ..\c10\cuda\CUDAFunctions.cpp:115.) return torch._C._cuda_getDeviceCount() > 0 Using default number of CPU cores (all of them). Processing using 16 / 16 CPU cores. C:\Users\stpu\FSCT\scripts\data directory found. C:\Users\stpu\FSCT\scripts\data\train directory found. C:\Users\stpu\FSCT\scripts\data\train\sample_dir directory created. C:\Users\stpu\FSCT\scripts\data directory found. C:\Users\stpu\FSCT\scripts\data\validation directory found. C:\Users\stpu\FSCT\scripts\data\validation\sample_dir directory created. Running deep learning using 1 / 16 CPU cores. Traceback (most recent call last): File "train.py", line 392, in run_training.run_training() File "train.py", line 218, in run_training raise NoDataFound("No training samples found.") fsct_exceptions.NoDataFound:

########################################################################### NO DATA FOUND ERROR: No training samples found. ########################################################################### '''

Is there something I am doing wrong?

stefp avatar May 08 '22 07:05 stefp

Hi Stef,

The first error looks like you either don't have an Nvidia GPU, or PyTorch can't find it. Do you have an Nvidia GPU? If so, what model is it? The default is currently set to use "cuda" in training, so would you be able to try setting device="cpu" in train.py (parameters section) and let me know if that works? It is considerably slower without CUDA, but it should be easier to get running. If you get a CUDA error, sometimes you need to actually kill that terminal/interpreter entirely and restart it, otherwise the error persists in a cryptic fashion, even if the problem is fixed in the code.

The second part looks like there wasn't a point cloud in the data/train/ directory. It should work out of the box with the example.las files that come with the repo. You will need at least 1 point cloud in the train directory and 1 in the validation directory (unless you turn the validation off in the parameters).

I need to improve/update the instructions for training, but the gist of it is to make sure a training point cloud is exactly like the example.las file and placed in the FSCT/data/train directory. It needs to have a single scalar field called "label", and nothing else. The labels need to be [0, 1, 2, 3] if you don't change the code, and if it's to be compatible with the rest of FSCT, those should be [terrain, vegetation, CWD, stem] respectively.

Hopefully that helps, but if not, would you mind describing the steps you're taking in as much detail as possible so I can see if anything jumps out at me?

Cheers, Sean

SKrisanski avatar May 08 '22 11:05 SKrisanski

Hi,

sorry I was doing something wrong it now training correctly (hopefully) even though I was not able to turn on the "cuda" option.

Only a last remark: in the last version of the train.py there was a preprocess_validation_datasets option but it seems that it was not included in the current version. Was that intentional or how should I process the test dataset?

Thanks again for the great support!

Stefano

stefp avatar May 08 '22 17:05 stefp

Hi Stefano,

No worries at all and glad to hear you got it up and running. The previous "test" set handling was a placeholder until I clean up that section and add it in. It wasn't actually functional, but it won't be long before I add it back. I'll add 2 testing options when I do. The first will be the classic sample-based one like the validation testing. The second will be operating on the entire point cloud at a point by point level, which allows testing of the part immediately after the deep learning which merges the predictions back to the original point cloud. The latter is more of a "real-world" test, as it's the actual output.

We'll leave the ticket open for now so I remember to add the testing section in soon.

Cheers, Sean

SKrisanski avatar May 09 '22 11:05 SKrisanski