CLAM
CLAM copied to clipboard
Fix: Slide ids turned into floats in split csv when names consist of only number
Summary of the Issue
- Slide IDs consisting solely of numerical characters are inadvertently converted to floats in the split CSV files
- The unequal lengths of
train,val, andtestsplits introduceNaNvalues when these splits are concatenated into a dataframe bysave_splits(). - Pandas automatically converts columns with all-numeric names and
NaNvalues to floats due to the lack ofNaNrep in integer columns in Pandas.
- The unequal lengths of
- When loading via the following line,
ValueErroras shown in the screenshot will occur https://github.com/mahmoodlab/CLAM/blob/3f875f77465b410d260f2afcfaea608a9d6ddbca/datasets/dataset_generic.py#L247
Proposed fix
- Cast slide IDs to strings before being saved to CSV in
save_splitsto prevent unintended type conversion.- Result:
- Result:
- Continue to read the dataset CSV with
dtype=objectinGeneric_WSI_Classification_Dataset.- In
get_split_from_df(), cast the dtype of the corresponding split column to match that ofself.slide_data['slide_id']. - This fix is pertaining https://github.com/mahmoodlab/CLAM/pull/90
- Result:
- In
This happened when I was working with my own task's dataset csv. I can provide the csv file to reproduce this bug if needs be.