HIPT Unable to find *.pt files for region_4096

Am I right in expecting the patch level feature *.pt files (433779 files each containing 256 x 384 tensor) used for pretraining the second stage of HIPT to be present in the HIPT/3-Self-Supervised-Eval/embeddings_patch_lib/ directory?

Currently, I only see the following pickle files in that directory.

25M     bcss_train_resnet50_trunc.pkl
9.3M    bcss_train_vits_tcga_brca_dino.pkl
4.5M    bcss_val_resnet50_tcga_brca_simclr.pkl
2.3M    bcss_val_resnet50_trunc.pkl
868K    bcss_val_vits_tcga_brca_dino.pkl
19M     breastpathq_train_resnet50_tcga_brca_simclr.pkl
9.4M    breastpathq_train_resnet50_trunc.pkl
3.6M    breastpathq_train_vits_tcga_brca_dino.pkl
1.5M    breastpathq_val_resnet50_tcga_brca_simclr.pkl
744K    breastpathq_val_resnet50_trunc.pkl
280K    breastpathq_val_vits_tcga_brca_dino.pkl
783M    crc100knonorm_train_resnet50_tcga_brca_simclr.pkl
393M    crc100knonorm_train_resnet50_trunc.pkl
149M    crc100knonorm_train_vits_tcga_brca_dino.pkl
57M     crc100knonorm_val_resnet50_tcga_brca_simclr.pkl
29M     crc100knonorm_val_resnet50_trunc.pkl
11M     crc100knonorm_val_vits_tcga_brca_dino.pkl
783M    crc100k_train_resnet50_tcga_brca_simclr.pkl
393M    crc100k_train_resnet50_trunc.pkl
149M    crc100k_train_vits_tcga_brca_dino.pkl
57M     crc100k_val_resnet50_tcga_brca_simclr.pkl
29M     crc100k_val_resnet50_trunc.pkl
11M     crc100k_val_vits_tcga_brca_dino.pkl

Thanks in advance.

Aug 01 '22 23:08 ramprs21

Hi @ramprs21 - see https://github.com/mahmoodlab/HIPT/tree/master/3-Self-Supervised-Eval/embeddings_slide_lib.

Aug 01 '22 23:08 Richarizardd

Hi @Richarizardd, the *.pt files in 3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings seem to not be the right dimensions (192 instead of 384), so I believe they are computed using outputs of 2nd stage. See below,

Python 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:24:11) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> data = torch.load('3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings/TCGA-BA-6869-01Z-00-DX1.6e58648e-3309-47bb-b2c7-b71bcd9dc69b.pt')
>>> data.shape
torch.Size([52, 192])

Whereas I am looking for inputs to the 2nd stage pre-training which I believe are a list of *.pt files each containing tensor of dimension (256x384).

Aug 01 '22 23:08 ramprs21

Hi @ramprs21 - apologies for the confusion. The previous link refers to the already pre-extracted "region-level" feature embeddings for each slide in TCGA. Regarding the *.pt files for hierarchical pretraining, it is logistically difficult at the moment to make available all [M x 256 x 384] "patch-level" feature embeddings, where M is the number of regions. Looking into ways to make this more available!

Aug 02 '22 00:08 Richarizardd

Thank you @Richarizardd. Could you please update here whenever you make the 1st stage features available? Thank you :)

Aug 04 '22 18:08 ramprs21

Hi @ramprs21 @Richarizardd , For the hierarchical pretraining (2nd stage), will the training time be much faster than the 1st stage since one region can now be converted into [256,384] which, when trained, will be reshaped again into [1,384,16,16]?

Dec 15 '22 07:12 bryanwong17

HIPT
HIPT copied to clipboard

Unable to find *.pt files for region_4096_pretraining

HIPT HIPT copied to clipboard

Unable to find *.pt files for region_4096_pretraining

HIPT
HIPT copied to clipboard