M3AE icon indicating copy to clipboard operation
M3AE copied to clipboard

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/downloaded/roberta-base/resolve/main/vocab.json

Open 1112321sfdsaf opened this issue 2 years ago • 9 comments

Hey, Thanks very much for the excellent work and repo. When I run 'bash run_scripts/pretrain_m3ae.sh', I got the above exception. I tried to click this link and found that the link is unavailable. So, I replaced ' tokenizer=downloaded/roberta-base' in 'pretrain_m3ae.sh' with ' tokenizer=roberta-base', and got a successful run. I would like to know if this operation is allowed. image

1112321sfdsaf avatar Sep 18 '22 06:09 1112321sfdsaf

Hi there,

Thanks for your attention. The operation is allowed since the downloaded RoBERTa model is the same one.

Best, Zhihong

zhjohnchan avatar Sep 18 '22 07:09 zhjohnchan

Thank you very much for your reply.

1112321sfdsaf avatar Sep 18 '22 08:09 1112321sfdsaf

Hi again, I found that the default batch size per GPU is 32, and my own configuration seems far from that requirement, can you provide how much GPU memory is needed for 32 batches?

1112321sfdsaf avatar Sep 18 '22 08:09 1112321sfdsaf

Hi,

I used A100-80G in my experiments.

Best, Zhihong

zhjohnchan avatar Sep 18 '22 08:09 zhjohnchan

Thanks for your reply again.

1112321sfdsaf avatar Sep 18 '22 09:09 1112321sfdsaf

Hi, when preparing the MELINDA dataset using the link introduced in the paper "MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification", I got a 404 on the page, which means the link does not exist. So, how can I download this dataset to follow your excellent work?

1112321sfdsaf avatar Sep 19 '22 07:09 1112321sfdsaf

Hi,

I request the dataset from the authors. I think you can send an e-mail to the author.

Best, Zhihong

zhjohnchan avatar Sep 19 '22 07:09 zhjohnchan

Thanks for your suggestion.

1112321sfdsaf avatar Sep 19 '22 08:09 1112321sfdsaf

Hi, I'm back. 😅

When I download VQA-RAD on the official page, I found that the files are different from the description of this repo. image Moreover, reading the dataset paper, it seems just to divide the dataset into a training set and a test set. So, can you provide more details on the data splits including training, validation, and test sets?

Thanks in advance.

1112321sfdsaf avatar Sep 19 '22 15:09 1112321sfdsaf