wikIR
wikIR copied to clipboard

Published 20 hours ago •

Reame
Issues

len_doc and encoding

Open TheAzouz opened this issue 3 years ago • 0 comments

Hello,

I would like to point out two issues I faced when working with wikIR tool:

There is a mistake in the documentation for the len_doc parameter. It says that by default it's equal to None (all tokens are collected) while in the code is 200. To get all tokens I used --len_doc -1
It would be good if we can specify the encoding of the input file and output file.

Jun 24 '21 12:06 TheAzouz