tdozat
tdozat
Currently, the model raises an error if not all buckets can be filled. When training this is hardly a problem, and likewise when parsing a sequence of suitably large files....
For some future ideas I have, the pretrained vocabulary will need approximated counts. One way to do this is to fit it to a zipfian distribution, but people have noticed...
Currently, you can change the configuration settings of a model to let it output different things--for example, if you want to train a joint tagger/parser model, you could set the...
Currently, once you construct a `Parseset` object from a training file, you can't reuse it with a different file. This means that when parsing thousands of files, you'll create thousands...
The config file dictates what special tokens are used by each vocabulary. This is because the parser needs to know which token in the training file(s) is the root. In...
The current way of ensuring valid trees is pretty hacky. Ideally, there should be three ways of MST parsing the returned probabilities, which should be configurable: 1. Argmax (greedy) 2....
The model saves a list of all the tokens in the vocabulary in `save_dir/words.txt`. If there's a case mismatch between the character model and the token model--that is, if you...