tdozat issues

Results 7 issues of


                                            tdozat

Adjustable multibucket sizes

Currently, the model raises an error if not all buckets can be filled. When training this is hardly a problem, and likewise when parsing a sequence of suitably large files....

enhancement

Remove/accelerate experimental features

For some future ideas I have, the pretrained vocabulary will need approximated counts. One way to do this is to fit it to a zipfian distribution, but people have noticed...

enhancement

Remove output vocabs

Currently, you can change the configuration settings of a model to let it output different things--for example, if you want to train a joint tagger/parser model, you could set the...

enhancement

Allow for rebucketing

Currently, once you construct a `Parseset` object from a training file, you can't reuse it with a different file. This means that when parsing thousands of files, you'll create thousands...

enhancement

Special token formatting

The config file dictates what special tokens are used by each vocabulary. This is because the parser needs to know which token in the training file(s) is the root. In...

enhancement

MST parsing algorithms

The current way of ensuring valid trees is pretty hacky. Ideally, there should be three ways of MST parsing the returned probabilities, which should be configurable: 1. Argmax (greedy) 2....

enhancement

Training set sometimes required for parsing

The model saves a list of all the tokens in the vocabulary in `save_dir/words.txt`. If there's a case mismatch between the character model and the token model--that is, if you...

bug