medicaldetectiontoolkit icon indicating copy to clipboard operation
medicaldetectiontoolkit copied to clipboard

Custom dataset, data loader

Open MancaZerovnikMekuc opened this issue 5 years ago • 6 comments

Hi,

I have a custom 3D dataset. I have spent a lot of time trying to run preprocessing script on LIDC data but I still have some issues running preprocessing script - looks like my characteristic.csv file is not what it should be. Can somebody provide a description how the output of the script is formatted?

I want to run the maskrcnn on my custom 3D dataset which consists of volumes with voxelwise annotations.

Can somebody describe how the data should be formatted for the example data_loader? "Example Data Loader for the LIDC data set. This dataloader expects preprocessed data in .npy or .npz files per patient and a pandas dataframe in the same directory containing the meta-info e.g. file paths, labels, foregound slice-ids."

From this I do not know how to format the data. Has anybody successfully ran the model on its own volumetric data with voxelwise annotations and can share the dataloader or some specificaton of the data formatting for the existing dataloader?

MancaZerovnikMekuc avatar Nov 22 '19 19:11 MancaZerovnikMekuc

you could generate the toy data and run some trainings on it. It will show you how the data is structured and read by the data loader.

pfjaeger avatar Nov 22 '19 20:11 pfjaeger

Thank you. I have done that. But I have multiple instances in one image. How to structure that kind of data? Also I would like to include patching. What and what form of data should be in variable "class_target" in meta_info_dict field created by preprocessing.py for your data_loader.py ? Also, I have 3D data, not 2D as in toy example.

MancaZerovnikMekuc avatar Nov 25 '19 17:11 MancaZerovnikMekuc

The toy data seems to only handle a segmentation example. Is there any documentation about how to generate bounding box labels?

lspinheiro avatar Jan 03 '20 01:01 lspinheiro

@MancaZerovnikMekuc could you show me how to read image, label when begin train. As possible as you can give me the example data structure? Thank you

thanhpt55 avatar Mar 16 '20 09:03 thanhpt55

@MancaZerovnikMekuc

Thank you. I have done that. But I have multiple instances in one image. How to structure that kind of data? Also I would like to include patching. What and what form of data should be in variable "class_target" in meta_info_dict field created by preprocessing.py for your data_loader.py ? Also, I have 3D data, not 2D as in toy example.

  • During batch generation (in the dataloader scripts) "class_target" information holds the RoI-wise class labels, i.e., one class label per RoI. It is structured as a list of lists of numbers per batch. I.e., in your generate_train_batch function in your BatchGenerator, your final batch dictionary should hold an entry "class_target", which looks, e.g., like this: [ [0,1], [2,0], [1] ]. In that example you have 3 classes. Batch element one has two RoIs, the first is of class 0, the second of class 1. Batch element two also has two RoIs, first of class 2, second of class 0. Third batch element has only one RoI of class 1. The id-number of a RoI equals its position within those batch elements lists. The id-number needs to correspond to the pixel-wise localization in the segmentation ground truth shifted by 1 since 0 is for background (all pixels that belong to RoI with id 0 need to be marked value 1 in the segmentation).

  • To include patching, I'd encourage you to follow the example in lidc_exp->dataloader.py PatientIterator. We only offer inclusive patching for the PatientIterator, i.e., during validation and testing, but not training. During training we sample patches instead of including the whole patched image. Apart from data loading you do not need to concern yourself with patching, it is all already implemented in the framework (in predictor.py).

  • The differences between 2D and 3D are marginal, you may look into lidc_exp for a guideline.

Gregor1337 avatar Apr 18 '20 13:04 Gregor1337

@Gregor1337 - very helpful. would be nice to have a documentation file with this information somewhere in the repo. It might also be useful to see how to structure this from a toy experiment that generates multiple ROIs for a single training example.

themantalope avatar May 31 '21 16:05 themantalope