covid19.MIScnn
covid19.MIScnn copied to clipboard
Dataset HU range
Hi @frankkramer
When I check the dataset, There are 2 part coranacases and radiopedia. For the radiopedia part the images set up 0-255. But for the coronacases part HU range -1250 to 250. I wonder how do you overcome this problem.
Hi @kirmans,
thanks for your interest in our study.
You are right, the Ma et al. dataset consists of coronacases and radiopedia data. Whereas the radiopedia data was already normalized to grayscale, the coronacases had the original HU ranging. However, they were not clipped originally between -1250 to 250, they had a normal CT ranging from -1000 up to 8000.
As you already mentioned, we performed at first a clipping approach on the all samples to -1250 to 250. Ideally, only the coronacases should be clipped (which we clipped on the result data for our publication), but we noticed that high value intensity (250-255) regions on the radiopedia volumes are not performance related, which is why we published this simplistic approach on just clipping all samples (coronacases and radiopedia) on -1250 to 250. This cut only the top 5 intensity values on the grayscale normalized radiopedia data, but performed a reasonable clipping on the coronacases.
Paper extract:
We exploited the Hounsfield units (HU) scale byclipping the pixel intensity values of the images to -1,250 asminimum and +250 as maximum, because we wereinterested in infected regions (+50 to +100 HU) and lungregions (-1,000 to -700 HU). It was only possible to applythe clipping approach on the Coronacases Initiative CTs,because the Radiopaedia volumes were already normalizedto a grayscale range between 0 and 255.
Therefore, long story short: We performed only clipping on coronacases and none on radiopedia, sadly. We also performed grayscale normalization afterwards on the coronacases in order to be equally processed as the radiopedia volumes. Still, we further normalized both of them via Z-Score as final step to increase efficiency of the model fitting process.
Here is also a small output of our data exploration, which can be reproduced by running scripts/data_exploration.py:
vol_shape vol_minimum vol_maximum \
coronacases_001 (512, 512, 301, 1) -1021.0 2996.0
coronacases_002 (512, 512, 200, 1) -1023.0 9567.0
coronacases_003 (512, 512, 200, 1) -1023.0 8931.0
coronacases_004 (512, 512, 270, 1) -1021.0 2020.0
coronacases_005 (512, 512, 290, 1) -1021.0 5528.0
coronacases_006 (512, 512, 213, 1) -1023.0 2217.0
coronacases_007 (512, 512, 249, 1) -1023.0 2515.0
coronacases_008 (512, 512, 301, 1) -1021.0 8575.0
coronacases_009 (512, 512, 256, 1) -1021.0 1845.0
coronacases_010 (512, 512, 301, 1) -1021.0 1920.0
radiopaedia_10_85902_1 (630, 630, 39, 1) 0.0 255.0
radiopaedia_10_85902_3 (630, 630, 418, 1) 0.0 255.0
radiopaedia_14_85914_0 (630, 401, 110, 1) 0.0 255.0
radiopaedia_27_86410_0 (630, 630, 66, 1) 4.0 255.0
radiopaedia_29_86490_1 (630, 630, 42, 1) 0.0 255.0
radiopaedia_29_86491_1 (630, 630, 42, 1) 0.0 255.0
radiopaedia_36_86526_0 (630, 630, 45, 1) 0.0 255.0
radiopaedia_40_86625_0 (630, 630, 93, 1) 12.0 255.0
radiopaedia_4_85506_1 (630, 630, 39, 1) 0.0 255.0
radiopaedia_7_85703_0 (630, 630, 45, 1) 0.0 255.0
If you have more questions, feel free to ask.
Cheers, Dominik