Self-PU
Self-PU copied to clipboard
ADNI exact data type used
Hi again,
Thanks again for looking into the issue I raised (#28).
I have already gained accessed to ADNI dataset, but I can only find .nii and .cdm data. As an example, a user you used based on your adni_dx_suvr_clean.csv
is subject 21
. When I look for the data of this person for T1 weigted MRI data I can only see files (in this case *.dcm
) in a folder folder structure like:
./ADNI/011_S_0021/MPRAGE/2015-11-03_15_30_05.0/I543543/ADNI_011_S_0021_MR_MPRAGE_br_raw_20151105104743943_154_S299478_I543543.dcm
But in your adni_dataset.py
the folder structure is mentioned as name: mri/grey/white/pet
:
https://github.com/VITA-Group/Self-PU/blob/a0e332ae4f8110e2490d597876e36bf837e1060f/adni_dataset.py#L146
Also I don't really see which data format of the images are you using in adni_dataset.py
? is it DCM (.dcm)
? is it NIfTI (nii)
?
To me it seems like you have did some preprocessing on images and store them as .npy
but I don't know where to find the code for that preprocessing. Do you mind sharing that please?
Another confusion I have is that why the default folder is mentioned as name: mri/grey/white/pet
? I thought MRI images are being used and not PET images?
Thank you!
Hi naji-s,
Thanks again for your interest. As we have mentioned in Issue #5, our dataset was downloaded and pre-processed by a company. Therefore, a lot of details are under NDA agreement. I am willing to help with anything else but I do feel sorry since we cannot reveal those details. Would you mind running experiments with a customized setting that anything you can set up on our side? Please let me know if your have further questions.
Hello @xxchenxx,
Thank you for getting back to me. I'm keen on replicating your results as accurately as possible, and I'm aiming to align my experiments closely with yours to enable a direct comparison. Specifically, when referring to the csv file you mentioned earlier in #5, I've noticed some discrepancies in the SUVR values when applying the code below:
import pandas as pd
from datetime import datetime
import numpy as np
import getpass
# Define the new file name
first_csv_name = 'adni_dx_suvr_clean' # Extracted from the first CSV file name without extension
second_csv_name = 'UCBERKELEYAV45_11_16_21_24Feb2024' # Extracted from the second CSV file name without extension
new_file_name = f"{second_csv_name}_no_match_from_{first_csv_name}.csv"
ROOT_PATH = getpass.getuser()
df1 = pd.read_csv(f'/Users/{ROOT_PATH}/{first_csv_name}.csv')
df2 = pd.read_csv(f'/Users/{ROOT_PATH}/{second_csv_name}.csv')
# Function to convert date from MM/DD/YY to YYYY-MM-DD format
def convert_date(date_str):
# Convert to datetime object
date_obj = datetime.strptime(date_str, '%m/%d/%y')
# Format to YYYY-MM-DD
return date_obj.strftime('%Y/%m/%d')
# Convert 'AV45 Date' in df1 and 'EXAMDATE' in df2 to the same format
df1['AV45 Date Formatted'] = df1['AV45 Date'].apply(convert_date)
df2['EXAMDATE Formatted'] = pd.to_datetime(df2['EXAMDATE']).dt.strftime('%Y-%m-%d')
# Re-initialize match results and total matches counter
match_results = []
total_matches = 0
# Iterate through df1 and check for matches in df2 based on RID and the formatted dates
for index, row in df1.iterrows():
rid = row['RID']
av45_date = row['AV45 Date Formatted']
suvr = row['SUVR']
# Find matching row in df2
# matching_row = df2[(df2['RID'] == rid) & (df2['EXAMDATE Formatted'] == av45_date)]
matching_row = df2[(df2['RID'] == rid) & (df2['EXAMDATE Formatted'] == av45_date) & (np.abs(df2['SUMMARYSUVR_WHOLECEREBNORM']-suvr)<0.1)]
# Check if there is at least one matching row
if not matching_row.empty:
match_results.append('Match')
total_matches += 1
else:
match_results.append('No Match')
# Add the match results to df1
df1['Match in UCBERKELEYAV45'] = match_results
# Calculate the percentage of matches
percentage_matches = (total_matches / len(df1)) * 100
# Extract rows from df2 that do not match any row in df1
non_matching_df2 = df2[~df2.set_index(['RID', 'EXAMDATE Formatted']).index.isin(df1.set_index(['RID', 'AV45 Date Formatted']).index)]
# Save the non-matching rows from df2 to a new CSV file
non_matching_df2.to_csv(f'/Users/{ROOT_PATH}/{new_file_name}.csv', index=False)
percentage_matches, df1.shape, non_matching_df2.shape
Using a threshold of 0.1, the matching rate between our datasets is 100%. However, when I adjust the threshold to 0.01, the matching results vary significantly. Additionally, I have concerns about the get_hippo function in my data processing. From what I understand, you might not be using a standard template but rather an indexed cropping method, which could significantly affect the results depending on voxel location. Thus, I'd like to know the MRI image processing standard you used to ensure the get_hippo function correctly extracts hippocampus patches.
My aim is to achieve reproducibility in our research. I hope you understand the importance of this. Could I consult with you after I prepare the final dataset to discuss the possibility of accessing tailored version of Self-PU code for this dataset?
Hi naji-s,
I will try to see what I can do from my side - I am afraid that I can no longer share any related data or scripts but please feel free to ask questions. I would try my best to engage in discussion too if time permitted.