PubMedCLIP icon indicating copy to clipboard operation
PubMedCLIP copied to clipboard

No such file or directory:radiologytraindata.csv

Open Tzx11 opened this issue 1 year ago • 1 comments

Hi,Do you know how to generate the CSV file

Tzx11 avatar Sep 26 '23 12:09 Tzx11

Navigate to the roco-dataset repository's root directory. Then, create a python file by running the following command in the terminal.

touch scripts/create_csv.py

Open create_csv.py and copy-paste the following code in it.

import pandas as pd
import numpy as np

# train
train_captions_dir = '../data/train/radiology/captions.txt'
colnames = ['id', 'caption']
train_captions = pd.read_csv(train_captions_dir, sep='\t', names=colnames)
train_captions['name'] = train_captions['id'].apply(lambda x: x+'.jpg')
train_captions = train_captions[['id', 'name', 'caption']]
train_captions.to_csv('../data/train/radiologytraindata.csv', index=False)
print('train pd:\n', train_captions.head())

# validation
val_captions_dir = '../data/validation/radiology/captions.txt'
colnames = ['id', 'caption']
val_captions = pd.read_csv(val_captions_dir, sep='\t', names=colnames)
val_captions['name'] = val_captions['id'].apply(lambda x: x+'.jpg')
val_captions = val_captions[['id', 'name', 'caption']]
val_captions.to_csv('../data/validation/radiologyvaldata.csv', index=False)
print('val pd:\n', val_captions.head())

# test
test_captions_dir = '../data/test/radiology/captions.txt'
colnames = ['id', 'caption']
test_captions = pd.read_csv(test_captions_dir, sep='\t', names=colnames)
test_captions['name'] = test_captions['id'].apply(lambda x: x+'.jpg')
test_captions = test_captions[['id', 'name', 'caption']]
test_captions.to_csv('../data/test/radiologytestdata.csv', index=False)
print('test pd:\n', test_captions.head())

Now, run the script by

python scripts/create_csv.py

It creates the csv files and saves them in the location where PubMedCLIP expects to find them.

yasamanparhizkar avatar Feb 13 '24 21:02 yasamanparhizkar