disco-dop
disco-dop copied to clipboard
Discontinuous Data-Oriented Parsing
I got a ValueError from applying `treetransforms.collapseunary` on `(B (E (C (a 0) (b 1)) (c 2)))`, ecpecting `(B+E (C (a 0) (b 1)) (c 2))`: ```ValueError: Cannot insert a...
- [ ] hierarchical subcorpus selection; handle corpora with large number of sections - [ ] query cancellation: pressing stop in browser should cancel the query. - [ ] pagination:...
https://github.com/explosion/wheelwright
multiprocessing pools work fine unless any kind of error condition arises... - [ ] properly detect segmentation faults, out of memory, &c. `concurrent.futures` does this, but doesn't take an `initializer`...
Tessil/ordered-map might be a better trade off than spp::sparse_hash.
Hello, the Tiger head rule set described in [1] (table 1, page 3) requires to match both the label and part of speech. They are the one used in the...
When these are installed, the installed script is wrong: ``` $ cat `which discodop` #!/usr/bin/python3 # EASY-INSTALL-SCRIPT: 'disco-dop==0.5rc1','discodop' __requires__ = 'disco-dop==0.5rc1' __import__('pkg_resources').run_script('disco-dop==0.5rc1', 'discodop') ``` The workaround is to remove these...
e.g., a pathological sentence with >1000 words will be too deep to recurse when binarized. - Any function that directly recurs on the children of a tree is affected, as...
- tgrep2: generally fast, but loads corpus at every invocation, and always returns an exhaustive list of all matches; no support for discontinuous constituents. - xpath / alpinocorpus: memory hungry,...
Would allow a potentially significant speedup for treebank transformations and grammar extraction. Wishlist: - represent all treebank information: functions, morphology, lemmas, &c. - combine indices and words in one datastructure...