language icon indicating copy to clipboard operation
language copied to clipboard

ORQA/REALM - Invalid argument: indices[0] = 10311577 is not in

Open ioannist opened this issue 4 years ago • 7 comments

I am trying to run Orqa finetuning with Realm models on my own data set. I have created the blocks.tfr file and used --num_block_records=154800 which is the number of rows in my records file.

I launch fine-tuning with

python -m language.orqa.experiments.orqa_experiment --retriever_module_path=$(pwd)/language/orqa/gs/realm-data/cc_news_pretrained/embedder --reader_module_path=$(pwd)/language/orqa/gs/realm-data/cc_news_pretrained/bert --block_records_path=$(pwd)/language/orqa/gs/orqa-data/trump-blocks/blocks.tfr --num_block_records=227246 --data_root=$(pwd)/language/orqa/gs/orqa-data/resplit --model_dir=$(pwd)/language/orqa/models/orqa_trump_model_from_realm --dataset_name=TrumpQuestions --num_train_steps=$(( 3417 * 20 )) --save_checkpoints_steps=1000

Everything seems to go well in the first ten minutes, but once shuffle buffer filling completes, I get an error

2020-10-13 13:06:30.130961: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:199] Shuffle buffer filled.
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/orqa/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1367, in _do_call
    return fn(*args)
  File "/home/ubuntu/anaconda3/envs/orqa/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1352, in _run_fn
    target_list, run_metadata)
  File "/home/ubuntu/anaconda3/envs/orqa/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1445, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: indices[0] = 10311577 is not in [0, 227246)
         [[{{node GatherV2_2}}]]
  (1) Invalid argument: indices[0] = 10311577 is not in [0, 227246)
         [[{{node GatherV2_2}}]]
         [[reader/RaggedRange_4/_8285]]
0 successful operations.
0 derived errors ignored.
....
....
Errors may have originated from an input operation.
Input Source operations connected to node GatherV2_2:
 blocks/read (defined at /ioannis/language/language/orqa/models/orqa_model.py:147)
 Squeeze (defined at /ioannis/language/language/orqa/models/orqa_model.py:129)

Input Source operations connected to node GatherV2_2:
 blocks/read (defined at /ioannis/language/language/orqa/models/orqa_model.py:147)
 Squeeze (defined at /ioannis/language/language/orqa/models/orqa_model.py:129)

ioannist avatar Oct 13 '20 13:10 ioannist

@ioannist , were you able to resolve this issue ? how did you create the tfr file ? i am starting to look into this and wondered if there was any quick reference that would help me get going. thanks!

mchari avatar Nov 05 '20 17:11 mchari

@michari no, unfortunately not :(

ioannist avatar Nov 09 '20 07:11 ioannist

@ioannist , any guidance on how to create a working tfr file ?

mchari avatar Nov 10 '20 00:11 mchari

can't recall exactly but I think I created a raw text file in the same format as the Wikipedia raw text file, and then I ran language.preprocessing.preprocess_wiki_extractor

ioannist avatar Dec 04 '20 08:12 ioannist

@ioannist ,hello,i’m sorry to bother you. Could you share me about how do you deal one doc to tfr。(what about language.preprocessing.preprocess_wiki_extractor)

johnbager avatar Dec 16 '20 03:12 johnbager

@ioannist I have the same issues. Have you solved it?

Kamel773 avatar Apr 28 '22 12:04 Kamel773

cc @kentonl
I am working on fine tuning ORQA in my own dataset, and I have the same issues. We appreciate your help in solving this issues.

Kind regards,

Kamel773 avatar Apr 29 '22 09:04 Kamel773