PubMedCLIP
PubMedCLIP copied to clipboard
No such file or directory:radiologytraindata.csv
Hi,Do you know how to generate the CSV file
Navigate to the roco-dataset repository's root directory. Then, create a python file by running the following command in the terminal.
touch scripts/create_csv.py
Open create_csv.py
and copy-paste the following code in it.
import pandas as pd
import numpy as np
# train
train_captions_dir = '../data/train/radiology/captions.txt'
colnames = ['id', 'caption']
train_captions = pd.read_csv(train_captions_dir, sep='\t', names=colnames)
train_captions['name'] = train_captions['id'].apply(lambda x: x+'.jpg')
train_captions = train_captions[['id', 'name', 'caption']]
train_captions.to_csv('../data/train/radiologytraindata.csv', index=False)
print('train pd:\n', train_captions.head())
# validation
val_captions_dir = '../data/validation/radiology/captions.txt'
colnames = ['id', 'caption']
val_captions = pd.read_csv(val_captions_dir, sep='\t', names=colnames)
val_captions['name'] = val_captions['id'].apply(lambda x: x+'.jpg')
val_captions = val_captions[['id', 'name', 'caption']]
val_captions.to_csv('../data/validation/radiologyvaldata.csv', index=False)
print('val pd:\n', val_captions.head())
# test
test_captions_dir = '../data/test/radiology/captions.txt'
colnames = ['id', 'caption']
test_captions = pd.read_csv(test_captions_dir, sep='\t', names=colnames)
test_captions['name'] = test_captions['id'].apply(lambda x: x+'.jpg')
test_captions = test_captions[['id', 'name', 'caption']]
test_captions.to_csv('../data/test/radiologytestdata.csv', index=False)
print('test pd:\n', test_captions.head())
Now, run the script by
python scripts/create_csv.py
It creates the csv files and saves them in the location where PubMedCLIP expects to find them.