MatchZoo-py
MatchZoo-py copied to clipboard
Error during training because of float length of sequence(?)
Describe the bug
Hi, I have the following issue:
when I am trying to train my model using trainer.run()
, I get the following error:
Traceback (most recent call last):
File "/Users/xx/.conda/envs/QL_QA/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-67-041e2033e90a>", line 1, in <module>
trainer.run()
File "/Users/xx/.conda/envs/QL_QA/lib/python3.7/site-packages/matchzoo/trainers/trainer.py", line 227, in run
self._run_epoch()
File "/Users/xx/.conda/envs/QL_QA/lib/python3.7/site-packages/matchzoo/trainers/trainer.py", line 251, in _run_epoch
for step, (inputs, target) in pbar:
File "/Users/xx/.conda/envs/QL_QA/lib/python3.7/site-packages/tqdm/std.py", line 1091, in __iter__
for obj in iterable:
File "/Users/xx/.conda/envs/QL_QA/lib/python3.7/site-packages/matchzoo/dataloader/dataloader.py", line 112, in __iter__
self._handle_callbacks_on_batch_unpacked(x, y)
File "/Users/xx/.conda/envs/QL_QA/lib/python3.7/site-packages/matchzoo/dataloader/dataloader.py", line 134, in _handle_callbacks_on_batch_unpacked
self._callback.on_batch_unpacked(x, y)
File "/Users/xx/.conda/envs/QL_QA/lib/python3.7/site-packages/matchzoo/dataloader/callbacks/padding.py", line 158, in on_batch_unpacked
self._pad_word_value, dtype=dtype)
File "/Users/xx/.conda/envs/QL_QA/lib/python3.7/site-packages/numpy/core/numeric.py", line 325, in full
a = empty(shape, dtype, order)
TypeError: 'numpy.float64' object cannot be interpreted as an integer
I am not 100% sure, but it seems to me that the error is caused by the fact that in my preprocessed datapack, length_right
is a float
instead of an int
(that seems to be the case in the toy datasets.).
>> toy_datapack.frame()[['length_right','length_left']]
Out[13]:
length_right length_left
0 58 29
1 41 29
2 41 29
3 61 29
4 128 29
5 126 85
6 128 85
while
train_pack = mz.DataPack(relation=relation[relation.id_left.isin(qids['train'])].reset_index(drop=True),
left=left[left.index.isin(qids['train'])],
# right=right_train,
right=right_dict['train'],
)
train_pack.frame().head().dtypes
Out[78]:
id_left object
text_left object
id_right object
text_right object
length_right float64
label float64
dtype: object
It also seems weird to me that this is happening, since to my understanding, the built-in python len
function should return an int
.
right_train['length_right'] = right_train.text_right.apply(len)
Out[15]:
text_right length_right
id_right
clueweb09-en0007-21-42346 Welcome | Logout Log In | Sign Up The Huffingt... 4039
clueweb09-enwp03-01-16807 Ann Dunham From Wikipedia, the free encycloped... 32225
clueweb09-en0010-93-11767 Home Contact Us Bookmark Us Receive Family Tre... 5112
clueweb09-enwp01-36-17161 Maya Soetoro-Ng From Wikipedia, the free encyc... 8279
clueweb09-enwp00-34-05344 Barack Obama, Sr. From Wikipedia, the free enc... 14448
clueweb09-enwp00-34-05347 Barack Obama, Sr. From Wikipedia, the free enc... 14478
I am preparing my data using mz.autoprepare
and the models I've tried to use are KNRM
and DRMM
, but the same issue still occurs.
My matchzoo.version`. = 1.1.1
I confirmed what I mentioned earlier regarding the cause of this problem:
I changed my script to a previous version, where a different document selection process (basically initial retrieval) was used, and indeed I observed that the training was running normally and the length_right
was int
instead of float
.
Any ideas on what might be going wrong here or whether there should be a catch in the framework to fix that (eg. converting floats to int)?