disco-dop
disco-dop copied to clipboard
Re-implement NLTK tree
Would allow a potentially significant speedup for treebank transformations and grammar extraction.
Wishlist:
- represent all treebank information: functions, morphology, lemmas, &c.
- combine indices and words in one datastructure
- parent pointers, sibling pointers
- store yield of each node, i.e., tree.leaves(). modification of tree triggers update in all ancestors.
- automatic canonicalization
- mutable and immutable versions. immutable version could use C arrays/structs.
- perhaps specific optimized version for binary trees.