dsmil-wsi icon indicating copy to clipboard operation
dsmil-wsi copied to clipboard

TCGA Dataset Training and Testing Distributions

Open bryanwong17 opened this issue 1 year ago • 1 comments

Hi, could you please share with me the distribution of slides used for training and testing in the TCGA dataset, along with their respective labels?

I noticed that it's mentioned here "We randomly split the WSIs into 840 training slides and 210 testing slides (4 low-quality corrupted slides are discarded)". However, upon examining the TEST_ID.csv file from this link, I observed that there are 214 testing slides. Could you provide clarification which slides were discarded? And also which slides are used for training? Thank you!

bryanwong17 avatar Jan 22 '24 01:01 bryanwong17

@bryanwong17, I went through this. See the results of my investigation in my README file for downloading TCGA.

GeorgeBatch avatar Feb 15 '24 10:02 GeorgeBatch