RetroMAE
RetroMAE copied to clipboard
Codebase for RetroMAE and beyond.
I have logged the parameters of the training into wandb (you can see the [link](https://api.wandb.ai/links/nguyenducnhan-work/j586u538) below).  This is my config: ```python pretrain.run --output_dir output_merge_data \ --report_to wandb \ --data_dir...
--2024-03-07 12:47:02-- https://msmarco.blob.core.windows.net/msmarcoranking/qidpidtriples.train.full.2.tsv.gz Resolving msmarco.blob.core.windows.net (msmarco.blob.core.windows.net)... 20.150.34.4 Connecting to msmarco.blob.core.windows.net (msmarco.blob.core.windows.net)|20.150.34.4|:443... connected. HTTP request sent, awaiting response... 404 The specified resource does not exist. 2024-03-07 12:47:03 ERROR 404: The specified...
Can you provide some examples of data formats for training pretrain, reranker, and retriever models? I have no concept of this. Thanks!
Great job! Hello , i wonder if you can tell me the training mlm accuracy of encoder and decoder. Im training my retromae model now.
Hello, I tried to use your checkpoint to finetune the RetroMAE_MSMARCO model, but the result is lower than the number in your paper(e.g. the MRR@10 is 0.393 in the paper,...
Hi staoxiao, I wanted to ask more about how the enhanced decoding works - it looks like it generates 256 random possible attention masks, and then picks randomly from that...
Code for RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models
Hello, thank you for your work and provided code! When do you plan to release code for RetroMAE v2?
Traceback (most recent call last): File "E:\RetroMAE-master\RetroMAE-master\examples\pretrain\preprocess.py", line 158, in wiki = create_wiki_data(args.tokenizer_name, args.max_seq_length, args.short_seq_prob) File "E:\RetroMAE-master\RetroMAE-master\examples\pretrain\preprocess.py", line 62, in create_wiki_data tokenizer = AutoTokenizer.from_pretrained("F:\bert-base-uncased") File "C:\Users\HZY\AppData\Local\Programs\Python\Python39\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 463, in from_pretrained...