fastBPE
fastBPE copied to clipboard
ReadCodes error
Loading codes from data/processed/XLM_en_zh/50k/codes ...
fast: fastBPE/fastBPE.hpp:458: void fastBPE::readCodes(const char*, std::unordered_map<std::pair<std::basic_string
kev123456 said (https://github.com/glample/fastBPE/issues/7#issue-404186789) Delete extra “o o 0” line in codes files.
I did this but Sorry, there is still the same mistake. what should I do?
if nCodes is wrongly chosen there will be errors when 'applybpe'. How to choose this number? I don't know. But lowering it solves the problem (choose a low value to see).
I guess am late to the party.. but still.. :laughing:
The problem is that, your 50K/codes file has some repeated entry.
A more common error is when you end up splitting same thing in 2 different ways, like you make 2 BPE code files; English and German, and in one "ant" is split as
a nt</w>
and in the other it is split as
an t</w>
.
Now when you join these 2 BPE files, trouble will happen. Following error will come :
Assertion `reverse_codes.find(pair) == reverse_codes.end()' failed.
I have written a script to solve both of these, here . Feel free to checkout. @Tikquuss , @zhishui3