Wsi-Caption icon indicating copy to clipboard operation
Wsi-Caption copied to clipboard

questions about the WSIs+caption paired data

Open pxliang opened this issue 1 year ago • 7 comments

Thanks for the amazing work!

I have a question regarding the image slides. For each patient, there can be more than one slide. For example, TCGA-5T-A9QA case has both the TCGA-5T-A9QA-01A-01-TSA and TCGA-5T-A9QA-01Z-00-DX1 slides. How do you pair these data with the caption during model training?

pxliang avatar Sep 16 '24 17:09 pxliang

we use the“DX" slide.

cpystan avatar Sep 17 '24 00:09 cpystan

we use the“DX" slide.

Thank you for your great work! I have a question. "DX" case also has more than one slide. For example, "TCGA-D8-A3Z5" has "TCGA-D8-A3Z5-01Z-00-DX1", "TCGA-D8-A3Z5-01Z-00-DX2" and "TCGA-D8-A3Z5-01Z-00-DX3". But there is only a report belonging to "TCGA-D8-A3Z5". Is the report used for the three slides?

51265904017 avatar Oct 31 '24 13:10 51265904017

Yes. Some cases have several DX slides. For this situation, we choose 'DX1'.

cpystan avatar Nov 01 '24 02:11 cpystan

Yes. Some cases have several DX slides. For this situation, we choose 'DX1'.

Thank you for your reply. I have another question. Do you use the "splits_0.csv" as the dataset splitting in your experiment?The train and val have the same case. For example, train has "TCGA-D8-A73X-01Z-00-DX1" and val has "TCGA-D8-A73X-01Z-00-DX2". So if you only choose 'DX1', how do you deal with the problem? Do you delete "TCGA-D8-A73X-01Z-00-DX2" in the val? If you delete all the "DX2","DX3","DX4", there are only 977 slides in the BRCA dataset.

51265904017 avatar Nov 03 '24 05:11 51265904017

We ignore the same case in the val or test. So the total slides will be a bit fewer.

cpystan avatar Nov 04 '24 03:11 cpystan

Why was the DX-labeled WSI slice chosen to be kept while other slices were deleted? Because this is a "multiple slices-single report" paired dataset, and the other slices also contribute to the formation of the diagnostic report.

xinfzhang avatar May 22 '25 05:05 xinfzhang

Why was the DX-labeled WSI slice chosen to be kept while other slices were deleted? Because this is a "multiple slices-single report" paired dataset, and the other slices also contribute to the formation of the diagnostic report.

'DX' means diagnostic slides which are acquired at high resolution scanning (usually 40x magnification), which is suitable for pathologists' diagnosis and analysis.

cpystan avatar May 22 '25 11:05 cpystan