questions about the WSIs+caption paired data
Thanks for the amazing work!
I have a question regarding the image slides. For each patient, there can be more than one slide. For example, TCGA-5T-A9QA case has both the TCGA-5T-A9QA-01A-01-TSA and TCGA-5T-A9QA-01Z-00-DX1 slides. How do you pair these data with the caption during model training?
we use the“DX" slide.
we use the“DX" slide.
Thank you for your great work! I have a question. "DX" case also has more than one slide. For example, "TCGA-D8-A3Z5" has "TCGA-D8-A3Z5-01Z-00-DX1", "TCGA-D8-A3Z5-01Z-00-DX2" and "TCGA-D8-A3Z5-01Z-00-DX3". But there is only a report belonging to "TCGA-D8-A3Z5". Is the report used for the three slides?
Yes. Some cases have several DX slides. For this situation, we choose 'DX1'.
Yes. Some cases have several DX slides. For this situation, we choose 'DX1'.
Thank you for your reply. I have another question. Do you use the "splits_0.csv" as the dataset splitting in your experiment?The train and val have the same case. For example, train has "TCGA-D8-A73X-01Z-00-DX1" and val has "TCGA-D8-A73X-01Z-00-DX2". So if you only choose 'DX1', how do you deal with the problem? Do you delete "TCGA-D8-A73X-01Z-00-DX2" in the val? If you delete all the "DX2","DX3","DX4", there are only 977 slides in the BRCA dataset.
We ignore the same case in the val or test. So the total slides will be a bit fewer.
Why was the DX-labeled WSI slice chosen to be kept while other slices were deleted? Because this is a "multiple slices-single report" paired dataset, and the other slices also contribute to the formation of the diagnostic report.
Why was the DX-labeled WSI slice chosen to be kept while other slices were deleted? Because this is a "multiple slices-single report" paired dataset, and the other slices also contribute to the formation of the diagnostic report.
'DX' means diagnostic slides which are acquired at high resolution scanning (usually 40x magnification), which is suitable for pathologists' diagnosis and analysis.