fastBPE icon indicating copy to clipboard operation
fastBPE copied to clipboard

ReadCodes error

Open zhishui3 opened this issue 4 years ago • 2 comments

Loading codes from data/processed/XLM_en_zh/50k/codes ... fast: fastBPE/fastBPE.hpp:458: void fastBPE::readCodes(const char*, std::unordered_map<std::pair<std::basic_string, std::basic_string >, unsigned int, fastBPE::pair_hash>&, std::unordered_map<std::basic_string, std::pair<std::basic_string, std::basic_string > >&): Assertion `codes.find(pair) == codes.end()' failed.

kev123456 said (https://github.com/glample/fastBPE/issues/7#issue-404186789) Delete extra “o o 0” line in codes files.

I did this but Sorry, there is still the same mistake. what should I do?

zhishui3 avatar Nov 17 '19 08:11 zhishui3

if nCodes is wrongly chosen there will be errors when 'applybpe'. How to choose this number? I don't know. But lowering it solves the problem (choose a low value to see).

Tikquuss avatar Apr 16 '20 14:04 Tikquuss

I guess am late to the party.. but still.. :laughing: The problem is that, your 50K/codes file has some repeated entry. A more common error is when you end up splitting same thing in 2 different ways, like you make 2 BPE code files; English and German, and in one "ant" is split as a nt</w> and in the other it is split as an t</w>. Now when you join these 2 BPE files, trouble will happen. Following error will come : Assertion `reverse_codes.find(pair) == reverse_codes.end()' failed. I have written a script to solve both of these, here . Feel free to checkout. @Tikquuss , @zhishui3

Jeevesh8 avatar Sep 19 '20 08:09 Jeevesh8