Seq2Seq-Vis icon indicating copy to clipboard operation
Seq2Seq-Vis copied to clipboard

Prepare and Run Own Models gives errors in extract_context.py h5py broadcasting

Open mohammedayub44 opened this issue 5 years ago • 2 comments

Hi,

Great work on the repository and the visualizations. This is very much useful. I had to create version specific pytorch models for this (using the custom install procedure) and ran into issues while preparing the data. Running the extract_context.py as shown below give h5py broadcasting error. It seems to be a known issue with h5py library.

Command Used: python extract_context.py -src data/src-train.txt -tgt data/tgt-train.txt -model demo-final_acc_5.53_ppl_4304.81_e1.pt -batch_size 10

image

Fix Used: Changing the embedding dimensions to match the pytorch models src and tgt embedding sizes worked fine.

Change on Line 169: size from 100 to 500 cstarset = f.create_dataset("cstar", (opt.batch_size,max_tgt_len,500), ...

Change on Line 178: size from 100 to 500 encoderset = f.create_dataset("encoder_out",(opt.batch_size,max_tgt_len,500), ...

Change on Line 185: size from 100 to 500 decoderset = f.create_dataset("decoder_out",(opt.batch_size,max_tgt_len,500), ...

Hoping the new version release will have more support for newer OpenNMT-py releases/models.

Thanks !

Mohammed Ayub

mohammedayub44 avatar Oct 01 '19 18:10 mohammedayub44

Hi Mohammed. Thank you for the fix suggestions.

HendrikStrobelt avatar Oct 03 '19 03:10 HendrikStrobelt

no problem :) @HendrikStrobelt
Better fix would be to add it as a variable for embedding size(i think), so it would work irrespective of how you built your model. (just happened to be that I trained the model for 500 embedding size which is default in the training documentation)

mohammedayub44 avatar Oct 04 '19 01:10 mohammedayub44