HIPT
HIPT copied to clipboard
Unable to find *.pt files for region_4096_pretraining
Am I right in expecting the patch level feature *.pt files (433779 files each containing 256 x 384 tensor) used for pretraining the second stage of HIPT to be present in the HIPT/3-Self-Supervised-Eval/embeddings_patch_lib/
directory?
Currently, I only see the following pickle files in that directory.
25M bcss_train_resnet50_trunc.pkl
9.3M bcss_train_vits_tcga_brca_dino.pkl
4.5M bcss_val_resnet50_tcga_brca_simclr.pkl
2.3M bcss_val_resnet50_trunc.pkl
868K bcss_val_vits_tcga_brca_dino.pkl
19M breastpathq_train_resnet50_tcga_brca_simclr.pkl
9.4M breastpathq_train_resnet50_trunc.pkl
3.6M breastpathq_train_vits_tcga_brca_dino.pkl
1.5M breastpathq_val_resnet50_tcga_brca_simclr.pkl
744K breastpathq_val_resnet50_trunc.pkl
280K breastpathq_val_vits_tcga_brca_dino.pkl
783M crc100knonorm_train_resnet50_tcga_brca_simclr.pkl
393M crc100knonorm_train_resnet50_trunc.pkl
149M crc100knonorm_train_vits_tcga_brca_dino.pkl
57M crc100knonorm_val_resnet50_tcga_brca_simclr.pkl
29M crc100knonorm_val_resnet50_trunc.pkl
11M crc100knonorm_val_vits_tcga_brca_dino.pkl
783M crc100k_train_resnet50_tcga_brca_simclr.pkl
393M crc100k_train_resnet50_trunc.pkl
149M crc100k_train_vits_tcga_brca_dino.pkl
57M crc100k_val_resnet50_tcga_brca_simclr.pkl
29M crc100k_val_resnet50_trunc.pkl
11M crc100k_val_vits_tcga_brca_dino.pkl
Thanks in advance.
Hi @ramprs21 - see https://github.com/mahmoodlab/HIPT/tree/master/3-Self-Supervised-Eval/embeddings_slide_lib.
Hi @Richarizardd, the *.pt files in 3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings
seem to not be the right dimensions (192 instead of 384), so I believe they are computed using outputs of 2nd stage. See below,
Python 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> data = torch.load('3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings/TCGA-BA-6869-01Z-00-DX1.6e58648e-3309-47bb-b2c7-b71bcd9dc69b.pt')
>>> data.shape
torch.Size([52, 192])
Whereas I am looking for inputs to the 2nd stage pre-training which I believe are a list of *.pt files each containing tensor of dimension (256x384)
.
Hi @ramprs21 - apologies for the confusion. The previous link refers to the already pre-extracted "region-level" feature embeddings for each slide in TCGA. Regarding the *.pt files for hierarchical pretraining, it is logistically difficult at the moment to make available all [M x 256 x 384]
"patch-level" feature embeddings, where M
is the number of regions. Looking into ways to make this more available!
Thank you @Richarizardd. Could you please update here whenever you make the 1st stage features available? Thank you :)
Hi @ramprs21 @Richarizardd , For the hierarchical pretraining (2nd stage), will the training time be much faster than the 1st stage since one region can now be converted into [256,384] which, when trained, will be reshaped again into [1,384,16,16]?