wikIR
wikIR copied to clipboard
len_doc and encoding
Hello,
I would like to point out two issues I faced when working with wikIR tool:
- There is a mistake in the documentation for the len_doc parameter. It says that by default it's equal to None (all tokens are collected) while in the code is 200. To get all tokens I used --len_doc -1
- It would be good if we can specify the encoding of the input file and output file.