Training set sometimes required for parsing

Open tdozat opened this issue 8 years ago • 0 comments

The model saves a list of all the tokens in the vocabulary in save_dir/words.txt. If there's a case mismatch between the character model and the token model--that is, if you want the character model to be cased and the word vocabulary to be caseless--it reads through the training set to build up the character vocabulary. This is a problem when you only want to parse and the training set isn't available.

Solution: modify the code to save cased and caseless vocabularies in save_dir/words-cased.txt and save_dir/words-caseless.txt, and at parse time load whichever one is dictated by the cased configuration setting.

Jun 18 '17 17:06 tdozat