TOP Cannot reproduce the scores reported in the paper

I am trying to reproduce the results from the paper using the code provided in this repository. However, I am encountering significantly lower scores than those reported in the paper for the TCGA-Lung dataset. For the TCGA-lung 16 shot setting, I can only get around 0.68 AUC score rather than 0.82 AUC score reported in the paper. Since the pre-processing details are not given, I worked on 10x magnification and utilized CLIP to extract features. Could you please give more instructions on reproducing the scores reported in the paper?

Oct 21 '24 16:10 pxliang

Thank you very much for your interest in our work!

In our experiments, we used 20x magnification for the TCGA dataset, dividing it into 224x224 patches, and utilized CLIP for feature extraction. In the challenging few-shot WSI setting, we observed that the results are highly sensitive to the random seed configuration. Specifically, different selections of few-shot WSIs can significantly affect the outcomes, especially if the chosen training samples are not representative or if there is a substantial difference between the selected samples and the corresponding language prompts. Unfortunately, the field currently lacks a standardized benchmark dataset for few-shot WSIs. In our code, we explored the performance across various seed settings for training samples, ensuring that all methods used the same seed.

Oct 23 '24 04:10 miccaiif

Thank you very much for your reply! I think in the pre-processing step, the patch features from CLIP model need to be normalized. I got better results using the normalized patch features.

I have another question regarding the COop baseline. What kind of pooling strategy you used for COop baseline?

Oct 25 '24 22:10 pxliang

And by the way, what's the magnification for the Camelyon16 dataset? I saw that in the released code, Camelyon16 uses the 5x. Is it the result reported in the paper?

Oct 27 '24 00:10 pxliang

@pxliang Hi, could you please tell me how these files train_feats.npy, train_corresponding_slide_label.npy, train_corresponding_slide_index.npy, and train_corresponding_slide_name.npy — were generated? Many thanks! https://github.com/miccaiif/TOP/blob/8356b0dd1786d35e6d6d617d2b253c949769ba39/Datasets_loader/dataset_TCGA_LungCancer.py#L198-L201

Nov 25 '24 09:11 invisprints

非常感谢您的回复！我认为在预处理步骤中，需要对 CLIP 模型的 patch 特征进行归一化。我使用标准化的补丁功能获得了更好的结果。

我还有另一个关于 COop 基线的问题。您为 COop baseline 使用了哪种池化策略？

您好，请问您通过对patch 特征进行归一化操作之后，得到的结果和论文中的结果差不多吗？我也同样遇到auc值较低的问题，期待您的回复！

Dec 19 '24 03:12 du-67

非常感谢您的回复！我认为在预处理步骤中，需要对 CLIP 模型的 patch 特征进行归一化。我使用标准化的补丁功能获得了更好的结果。我还有另一个关于 COop 基线的问题。您为 COop baseline 使用了哪种池化策略？

您好，请问您通过对patch 特征进行归一化操作之后，得到的结果和论文中的结果差不多吗？我也同样遇到auc值较低的问题，期待您的回复！

No, I can not reproduce the results reported in the paper.

Dec 19 '24 03:12 pxliang

非常感谢您的回复！我认为在预处理步骤中，需要对 CLIP 模型的 patch 特征进行归一化。我使用标准化的补丁功能获得了更好的结果。我还有另一个关于 COop 基线的问题。您为 COop baseline 使用了哪种池化策略？

您好，请问您通过对补丁特征进行归一化操作之后，得到的结果和论文中的结果差不多吗？我也同样遇到auc值较低的问题，期待您的回复！

不，我无法重现论文中报告的结果。

感谢回复！请问归一化之后能提高到多少呢？我目前camelyon16数据集的16shot也只能达到0.6左右。

Dec 19 '24 03:12 du-67

非常感谢您的回复！我认为在预处理步骤中，需要对 CLIP 模型的 patch 特征进行归一化。我使用标准化的补丁功能获得了更好的结果。我还有另一个关于 COop 基线的问题。您为 COop baseline 使用了哪种池化策略？

您好，请问您通过对补丁特征进行归一化操作之后，得到的结果和论文中的结果差不多吗？我也同样遇到auc值较低的问题，期待您的回复！

不，我无法重现论文中报告的结果。

感谢回复！请问归一化之后能提高到多少呢？我目前camelyon16数据集的16shot也只能达到0.6左右。

Same here. But for TCGA-lung dataset, the result is better, around 0.78.

Dec 19 '24 03:12 pxliang

@pxliang Hi, could you please tell me how these files train_feats.npy, train_corresponding_slide_label.npy, train_corresponding_slide_index.npy, and train_corresponding_slide_name.npy — were generated? Many thanks!

https://github.com/miccaiif/TOP/blob/8356b0dd1786d35e6d6d617d2b253c949769ba39/Datasets_loader/dataset_TCGA_LungCancer.py#L198-L201

Hi, I am a beginner in school, how is the preprocessing of the code done? How are those npy files made? If you know how can you share? Thank you very much!

Dec 25 '24 09:12 yangyuplus

非常感谢您的回复！我认为在预处理步骤中，需要对 CLIP 模型的 patch 特征进行归一化。我使用标准化的补丁功能获得了更好的结果。我还有另一个关于 COop 基线的问题。您为 COop baseline 使用了哪种池化策略？

您好，请问您通过对patch 特征进行归一化操作之后，得到的结果和论文中的结果差不多吗？我也同样遇到auc值较低的问题，期待您的回复！ Hi, I am a beginner in school, how is the preprocessing of the code done? How are those npy files made? If you know how can you share? Thank you very much!

Dec 25 '24 09:12 yangyuplus

非常感谢您的回复！我认为在预处理步骤中，需要对 CLIP 模型的 patch 特征进行归一化。我使用标准化的补丁功能获得了更好的结果。我还有另一个关于 COop 基线的问题。您为 COop baseline 使用了哪种池化策略？

您好，请问您通过对补丁特征进行归一化操作之后，得到的结果和论文中的结果差不多吗？我也同样遇到auc值较低的问题，期待您的回复！

不，我无法重现论文中报告的结果。

感谢回复！请问归一化之后能提高到多少呢？我目前camelyon16数据集的16shot也只能达到0.6左右。

Same here. But for TCGA-lung dataset, the result is better, around 0.78.

Hi, I am a beginner in school, how is the preprocessing of the code done? How are those npy files made? If you know how can you share? Thank you very much! 我是一名初学者，这个.npy文件是怎么制作的呢？麻烦能不能指导我一下，谢谢。

Dec 25 '24 09:12 yangyuplus