g2pW icon indicating copy to clipboard operation
g2pW copied to clipboard

The length of input could not more than 16?

Open yt605155624 opened this issue 3 years ago • 3 comments

I'm testing the onnx version by @BarryKCL and found that once the input len more than 16, the onnxruntime sess will not give output without any error。。I don't known if it's bug of onnxruntime or feature of g2pw model?

yt605155624 avatar Aug 10 '22 12:08 yt605155624

it's the bug in he's preprocess

yt605155624 avatar Aug 10 '22 13:08 yt605155624

Is window_size necessary for inference? window_size = 32 in _truncate_texts(window_size, texts, query_ids),

start = max(0, query_id - window_size // 2) end = min(len(text), query_id + window_size // 2) truncated_text = text[start:end]

so input "這場抗議活動究竟是如何發展演變的。" will become:

truncated_texts: ['這場抗議活動究竟是如何發展演變的', '這場抗議活動究竟是如何發展演變的。', '這場抗議活動究竟是如何發展演變的。', '這場抗議活動究竟是如何發展演變的。', '這場抗議活動究竟是如何發展演變的。', '這場抗議活動究竟是如何發展演變的。', '這場抗議活動究竟是如何發展演變的。']

in pytorch,tensor alignment can be solved. but in my code,i use list2numpy.

so,i set window_size=None to solve the input_ids not alignment.

BarryKCL avatar Aug 11 '22 03:08 BarryKCL

@BarryKCL Our model is trained on the hyper-parameter window_size = 32. It might slightly affect performance after changing this hyper-parameter.

GitYCC avatar Aug 11 '22 07:08 GitYCC