deepcell-tf icon indicating copy to clipboard operation
deepcell-tf copied to clipboard

Why is the tissuenet 1.1 having weird color value for each channel

Open AlexInJar opened this issue 2 years ago • 3 comments

Describe the bug The newly released TissueNet 1.1 dataset has strange values for each image channel, which is neither scaled from 0 to 1 nor is it scaled from 0 to 255. For example, the training image indexed 1198: np.min(img[...,0]) #47.0 np.max(img[...,0]) #1027.0

May I ask how is the image obtained? And how is it transformed into numpy array? I think releasing the *.png directly might be more convenient for people such as myself who need to train on the images. Even if we have an empty channel in the png. Could we do that?

Desktop (please complete the following information):

  • OS: Ubuntu 20.4

AlexInJar avatar Aug 01 '22 07:08 AlexInJar

Hey @AlexInJar, this dataset contains microscopy images from a range of different microscopes. Each microscope has a different output range, depending on how the manufacturer configured it and the user set it up. TissueNet1.0 normalized this data by diving each image by its max. However, for dim images, this causes the noise baseline to be artificially inflated. For TissueNet1.1, we've left the data as the raw, untransformed values.

The normalization function that Mesmer uses automatically handles images with different dynamic ranges. However, you are welcome to use whatever normalization scheme you want when processing the data, including dividing by the image-wise or dataset wide maximum. There is no original .png image, as these data were not generated using a standard camera.

Let me know if you have any other questions.

ngreenwald avatar Aug 03 '22 04:08 ngreenwald

Hi, thanks for the response. May I ask what is the motivation for this histogram equalization step? Are human annotators getting the histogram equalized version of the image or just the microscopy version?

AlexInJar avatar Aug 08 '22 04:08 AlexInJar

The annotators get the raw version, with the ability to rescale each channel to whatever brightness they find useful. The histogram normalization is an automated way to accomplish the same goal, which is ensuring that the data has the same approximate range of values across the different platforms.

ngreenwald avatar Aug 08 '22 20:08 ngreenwald