pytorch-realm-orqa icon indicating copy to clipboard operation
pytorch-realm-orqa copied to clipboard

Parameters of the retriever in fine-tuning

Open catalwaysright opened this issue 2 years ago • 17 comments

Hi! I am wondering why the retriever is frozen during fine-tuning time. I think the retriever will learn more in fine-tuning. I am not very familiar with tensorflow. Is it possible to update the parameters of the retriever during fine-tuning time with this repository? How?

catalwaysright avatar Mar 19 '22 01:03 catalwaysright

See #5 #6, and see the papers.

qqaatw avatar Mar 19 '22 03:03 qqaatw

See #5 #6, and see the papers.

Thanks for your reply! I have checked the issues and the paper. I just want to double check if I get it right. The parameters of query embedder are actually updated during fine-tuning but we just don't update the document embeddings with the updated query embedder. Thus, the embeddings of the same question will be different since the query embedder is optimized during fine-tuning and we may get different top-k relevant documents in the process of fine-tuning even if we input the same question.

catalwaysright avatar Mar 19 '22 04:03 catalwaysright

Indeed, that is how optimization works, isn’t it?

We could migrate the async index refresh here, but it requires a lot of work due to its complexity.

On Sat, Mar 19, 2022 at 12:40 PM catalwaysright @.***> wrote:

See #5 https://github.com/qqaatw/pytorch-realm-orqa/issues/5 #6 https://github.com/qqaatw/pytorch-realm-orqa/issues/6, and see the papers.

Thanks for your reply! I have checked the issues and the paper. I just want to double check if I get it right. The parameters of query embedder are actually updated during fine-tuning but we just don't update the document embeddings with the updated query embedder. Thus, the embeddings of the same question will be different since the query embedder is optimized during fine-tuning and we may get different top-k relevant documents in the process of fine-tuning even if we input the same question.

— Reply to this email directly, view it on GitHub https://github.com/qqaatw/pytorch-realm-orqa/issues/9#issuecomment-1072939147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNTIBHEP4DRFOGRKEUTVAVLDRANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.***>

qqaatw avatar Mar 19 '22 05:03 qqaatw

Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset?

catalwaysright avatar Mar 20 '22 01:03 catalwaysright

How did you download NQ?

On Sun, Mar 20, 2022 at 9:19 AM catalwaysright @.***> wrote:

Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset?

— Reply to this email directly, view it on GitHub https://github.com/qqaatw/pytorch-realm-orqa/issues/9#issuecomment-1073142616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.***>

qqaatw avatar Mar 20 '22 03:03 qqaatw

How did you download NQ? On Sun, Mar 20, 2022 at 9:19 AM catalwaysright @.> wrote: Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset? — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.>

by using gsutil -m cp -R gs://natural_questions/v1.0 <path to your data directory> and the structure is like this 1647747161(1)

catalwaysright avatar Mar 20 '22 03:03 catalwaysright

The preferred way to download is using huggingface’s datasets library, which provides many utilities like caching, mapping, and filtering. The dataset’s source this library uses is also from Google.

If you however want to handle them by yourself, you’ll need to design a dataset loading function in data.py that returns the same format as load_nq().

On Sun, Mar 20, 2022 at 11:35 AM catalwaysright @.***> wrote:

How did you download NQ? … <#m_-6377894214107844352_> On Sun, Mar 20, 2022 at 9:19 AM catalwaysright @.> wrote: Another question is that I downloaded the natural_questions dataset to local but when I tried to load it using the load function provided in data.py, it showed that Dataset path currently not supported., which is just because it is local and I provide an OS path. How to fix it and load the local natural_questions dataset? — Reply to this email directly, view it on GitHub <#9 (comment) https://github.com/qqaatw/pytorch-realm-orqa/issues/9#issuecomment-1073142616>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A https://github.com/notifications/unsubscribe-auth/AF5PKNXXYRJBHTYMMFVTRQTVAZ4IVANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.>

by using gsutil -m cp -R gs://natural_questions/v1.0 and the structure is like this [image: 1647747161(1)] https://user-images.githubusercontent.com/60195620/159146888-6d2d70eb-322d-4b17-bafd-5df1979d36c1.png

— Reply to this email directly, view it on GitHub https://github.com/qqaatw/pytorch-realm-orqa/issues/9#issuecomment-1073159145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNTNYLZ53DXMTUQHNJLVA2MG3ANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.***>

qqaatw avatar Mar 20 '22 04:03 qqaatw

Thank you so much for answering my questions so patiently! I encountered another problem when running run_finetune.py with the exactly same args as your experiment. However, I got cuda out of memory like this. image I am running it on one V100 GPU with 15GB memory and I set the batch size to 1. Is it still not big enough to run this? How could I reduce the memory consumption and reproduce the experiment?

catalwaysright avatar Mar 24 '22 01:03 catalwaysright

Hi, the fine-tune training given the default configuration can be run on single RTX 2080Ti, so V100 with 15GB mem is totally sufficient. You may find the reasons/solutions by googling the error message.

@catalwaysright Hey sorry I forgot to mention this, If you installed transformers from master, you may need to add this line model.block_embedding_to("cpu") after sending the model to GPU because the latest patch for REALM by default has block_emb tensor, which would occupy appreciable GPU memory, sent to GPU along with model.cuda().

qqaatw avatar Mar 27 '22 07:03 qqaatw

Sorry for bothering you again. Please show the specific place I should add model.block_embedding_to("cpu"), because when I add it after sending the model to GPU in run_finetune.py, it shows AttributeError: 'RealmForOpenQA' object has no attribute 'block_embedding_to'. Thanks!

catalwaysright avatar Apr 11 '22 23:04 catalwaysright

Hi, which version of transformers are you using? You can install transformers==4.18.0, where the latest REALM patch is included.

https://huggingface.co/docs/transformers/model_doc/realm#transformers.RealmForOpenQA.block_embedding_to

qqaatw avatar Apr 12 '22 11:04 qqaatw

I tried your approach and is still shows cuda out of memory, but I figured it out that it may be normal because there is only 8G memory left on V100, which is not enough to load and optimize the whole model. How much space did you allocate in your RTX2080Ti?

catalwaysright avatar Apr 16 '22 03:04 catalwaysright

Please reserve GPU memory at least equal or greater than 2080Ti. This is the minimal requirement.

On Sat, Apr 16, 2022 at 11:22 AM catalwaysright @.***> wrote:

I tried your approach and is still shows cuda out of memory, but I figured it out that it may be normal because there is only 8G memory left on V100, which is not enough to load and optimize the whole model. How much space did you allocate in your RTX2080Ti?

— Reply to this email directly, view it on GitHub https://github.com/qqaatw/pytorch-realm-orqa/issues/9#issuecomment-1100522890, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF5PKNRYQSQPLXLUS3MSFEDVFIXANANCNFSM5RDJG35A . You are receiving this because you commented.Message ID: @.***>

qqaatw avatar Apr 16 '22 05:04 qqaatw

Hi! Now I am modifying this model with multiple retrievers and I am trying to train this model. However, during the training process, I found that the retriever loss and reader loss are all 0.0 at most times while the reader loss is also often 0.0 when I was training the original model. Why would there be so many 0.0? Is this normal at the beginning or there are other tricks of training this model.

catalwaysright avatar May 18 '22 06:05 catalwaysright

If there is no presence of ground truth in any retrieved context or predicted answer span, their loss will be set to zero respectively to prevent ineffective updates.

https://github.com/huggingface/transformers/blob/v4.19.2/src/transformers/models/realm/modeling_realm.py#L1662-L1663

It's likely to happen when you train the model from scratch without loading a pre-trained checkpoint like cc_news or having proper warm up.

qqaatw avatar May 18 '22 07:05 qqaatw

On I see! So it will be fine after more steps right?

catalwaysright avatar May 18 '22 07:05 catalwaysright

For training from scratch, you should follow the steps in REALM/ORQA paper to pre-train/warmup your model; otherwise, the model is unlikely to further improve. If you were fine-tuning from cc-news or a proper pre-trained checkpoint, then you can keep training and check the improvement of the losses.

qqaatw avatar May 18 '22 07:05 qqaatw