lightning-Covid19 icon indicating copy to clipboard operation
lightning-Covid19 copied to clipboard

data augmentation

Open Borda opened this issue 4 years ago • 14 comments

Add reasonable image augmentation

  • horizontal/vertical flip
  • rotation
  • zoom
  • etc

Borda avatar Mar 17 '20 10:03 Borda

cool, we have all this in kornia

edgarriba avatar Mar 17 '20 12:03 edgarriba

/cc @ducha-aiki @shijianjian @anguelos

edgarriba avatar Mar 17 '20 13:03 edgarriba

Horizontal flip looks like a suitable augmentation, I'm not completely sure if vertical flip/rotation introduces interesting priors as X-rays are usually similarly oriented

shpotes avatar Mar 17 '20 13:03 shpotes

rotation limited to small degrees I guess yes

edgarriba avatar Mar 17 '20 13:03 edgarriba

Horizontal flip looks like a suitable augmentation, I'm not completely sure if vertical flip/rotation introduces interesting priors as X-rays are usually similarly oriented

unless there is the assumption that the object looks vertical different... but this could be just an extra training parameter, right?

Borda avatar Mar 17 '20 13:03 Borda

Where is the data in the first place? no link in readme

bluesky314 avatar Mar 17 '20 14:03 bluesky314

data link is https://github.com/PyTorchLightning/lightning-Covid19/issues/2 here

ducha-aiki avatar Mar 17 '20 14:03 ducha-aiki

In the Chester paper In figure 3. We can see that for pneumonia specifically augmentation might even do bad. If I am reading the plots right, the first column (undistorted test set) seems the most important. It seems that modest rotation scale and translation is the best augmentation. 15deg, 10%, and 10% respectively.

anguelos avatar Mar 17 '20 15:03 anguelos

Generally, I think it should be alright as long as the label will not be changed by augmentation methods. For instance, ElasticTransform is probably a dangerous move. It would be best if we can invite a chest CT expert for more guidance.

If I understand this right, this project aims to tell Covid-19 out of other pneumonia pathologies like SARS, etc. Thus, we also need more support on pathology understanding to emphasize the most correlated features in the preprocessing phase and augmentations. In a clinical perspective, I think it also helps if we tell how CT experts make their decisions.

shijianjian avatar Mar 17 '20 18:03 shijianjian

just noticed that the images come in a range between ~ +- 1000

edgarriba avatar Mar 27 '20 16:03 edgarriba

it is quite common for medical images as they can be also in tiff with some offset :]

Borda avatar Mar 27 '20 16:03 Borda

gotcha. And do we want that for training ? https://github.com/mlmed/torchxrayvision/blob/master/torchxrayvision/datasets.py#L47-L51

I think the dataset generator can be improved somehow

edgarriba avatar Mar 27 '20 17:03 edgarriba

I think that we shall scale then anyway with the mean and SDT to about (-1, 1) interval

Borda avatar Mar 27 '20 18:03 Borda

@Borda sure. Apparently images in this dataset are in png, jpg and jpeg. Some my guess no need apply apply an initial conversion. Please, also check my comment in here: https://github.com/PyTorchLightning/lightning-Covid19/pull/18#discussion_r399652347

image

image

image

not sure what would be the best. My guess would the best to analyses the whole image and create some kind of attention to not miss any part.

edgarriba avatar Mar 28 '20 11:03 edgarriba