data2text-plan-py data_utils.py List index out of range

While creating train-roto-ptrs.txt using ptrs mode, I am getting this index error:

Traceback (most recent call last):
  File "data_utils.py", line 859, in <module>
    make_pointerfi(args.output_fi, inp_file=args.input_path, content_plan_inp=args.train_content_plan)
  File "data_utils.py", line 593, in make_pointerfi
    content_plan_entry = [content_plan_record for content_plan_record in content_plan[i]]
IndexError: list index out of range

Any quick suggestion?

Feb 22 '19 18:02 shubhamagarwal92

The error will occur if the sizes of the content plan and training data do not match.

Feb 22 '19 19:02 ratishsp

I use provided train.json and inter/train_content_plan.txt(I do not generate them by myself), it shows error: Traceback (most recent call last): File "data_utils.py", line 887, in make_pointerfi(args.output_fi, inp_file=args.input_path, content_plan_inp=args.train_content_plan) File "data_utils.py", line 614, in make_pointerfi content_plan_entry = [content_plan_record for content_plan_record in content_plan[i]] IndexError: list index out of range what should I do to fix this bug?

May 26 '19 05:05 tuyaao

Hi @ratishsp ,

Thank you very much for sharing the code and for answering many questions for people trying to replicate your result which is very helpful to me as well.

I am sorry but I want to reopen this issue because I had the same error above and could not resolve it.

I run the following command: python data_utils.py -mode ptrs -input_path $BASE/rotowire/train.json -train_content_plan $BASE/rotowire/inter/train_content_plan.txt -output_fi $BASE/rotowire/train-roto-ptrs.txt

with:

train.json extracted from rotowire.tar.bz2 in https://github.com/harvardnlp/boxscore-data
train_content_plan.txt copied from /inter in https://drive.google.com/drive/folders/1R_82ifGiybHKuXnVnC8JhBTW8BAkdwek

The error occurred in the same place:

File "data_utils.py", line 614, in make_pointerfi content_plan_entry = [content_plan_record for content_plan_record in content_plan[i]] IndexError: list index out of range

Following your comment:

The error will occur if the sizes of the content plan and training data do not match.

I checked the data size and found the mismatch between train.json (3398 as in https://github.com/harvardnlp/boxscore-data) and train_content_plan.txt (3371 line). The problem to me is that both the input data are provided and could not do anything with those.

To be honest, I am not very familiar with OpenNMT etc., so if it is too obvious I am sorry. Could you please tell me if I miss anything?

My environment is below (I could not follow the env specified in requirement.txt, but I do not think it matters to the error...):

future                       0.18.2
nltk                         3.4.5
six                          1.14.0             
torch                        1.8.1              
torchtext                    0.9.1              
tqdm                         4.42.1

Thank you very much for your help.

May 06 '21 00:05 ghtaro

Hi @ghtaro, Nice to know that you found the code useful. I am not sure about the root cause of the issue you are facing. But as mentioned in https://github.com/ratishsp/data2text-plan-py/issues/26#issuecomment-769032836, I have realized that the pointer network supervision is not strictly required. So you can comment any code which uses the pointer supervision.

May 09 '21 13:05 ratishsp

Hi @ratishsp ,

Thank you very much for your prompt reply.

I am not sure about the root cause of the issue you are facing.

I was able to setup the very similar computational environment now, but still got the same error messages...

Anyway, understood, I will follow the instruction (in #26). Please leave the reopened issue as it is until I can run python script without any error messages...

May 10 '21 05:05 ghtaro

I have the same problem. Do you solve it?

Jul 01 '21 15:07 happycjksh

HI, @ghtaro, I have the same problem and I've stuck in it for many days. Could you tell me how to solve it?

Jul 01 '21 15:07 happycjksh

Hi @happycjksh, I think I now understand what the root cause of the issue is. It is indeed related to https://github.com/ratishsp/data2text-plan-py/issues/34. The lengths of train.json and train_content_plan.txt are different because there were some training examples for which no content plans were found. Such examples were excluded during training. So I had worked with a subset of 3371 examples in training for which the content plans could be extracted. I have shared train.json at https://drive.google.com/file/d/1uuRckc6D2WIvrpoadNj-lbilw5XCjR12/view?usp=sharing which matches the length of train_content_plan.txt. Please try with this train.json file and let me know if it works.

Jul 01 '21 17:07 ratishsp

Thanks for your answer. I'll load the new train.json immediately and give you a reply about the result as soon as possible.

Jul 02 '21 07:07 happycjksh

Hi @ratishsp, I'm so sorry. Although the data_utils.py can drive, the train-roto-ptrs.txt is an empty file. I try to solve the problem, but useless. I hope you can help me solve the problem. Thank you

Jul 02 '21 13:07 happycjksh

Oh. I am not sure why the file is empty. You can use the train-roto-ptrs.txt file from the location https://drive.google.com/drive/folders/1R_82ifGiybHKuXnVnC8JhBTW8BAkdwek

Jul 02 '21 13:07 ratishsp

I'll try to solve it. If I solve the problem, I can tell you the reason.

Jul 02 '21 14:07 happycjksh

data2text-plan-py data2text-plan-py copied to clipboard

data_utils.py List index out of range

data2text-plan-py
data2text-plan-py copied to clipboard