phylanx icon indicating copy to clipboard operation
phylanx copied to clipboard

phylanx topology structure feature request(s)

Open ct-clmsn opened this issue 6 years ago • 3 comments

Tree transducers require an input tree stored in Chomsky normal form (a binary tree structure). Transducers additionally require a set of binary encodings, called paths, of a tree's topology. Each tree has multiple paths, each node has a path, and each path has a "label" (the literature uses the term 'label', think of this as a 'value').

Two training algorithms are being actively implemented (one is EM, the other is perception-based). On the other end of that work, having the following capabilities provided with the tree topology structures would make testing and merging the transducer work into trunk significantly faster.

  • Provide a visitor that can emulate a Chomsky traversal of the topology. the topology isn't in Chomsky form, which is okay; being able to "fake" a traversal will save having to do any type of tree transformations from the existing structure into a binary version/copy of the tree.

  • Provide the root node of a tree topology with an attribute that contains a set of paths and each path's corresponding label (value). This attribute can be a vector of tuples std::tuple< label, std::vector<uint8_t> >, a map of labels to a vector of paths std::map< label, std::vector< std::vector<uint8_t> >, or consider storing paths into a Boost "big int" in lieu of a std::vector< uint8_t > - feel free to provide additional suggestions. Suggestions will be reflected back into the training algorithm implementations.

ct-clmsn avatar Feb 04 '18 00:02 ct-clmsn

@ct-clmsn what kind of traversal do you need? in-fix, pre-fix, post-fix?

Also, could you elaborate on the concept of 'path', please? What information should be encoded by it?

hkaiser avatar Feb 04 '18 18:02 hkaiser

@hkaiser working on a thorough description!

ct-clmsn avatar Feb 06 '18 12:02 ct-clmsn

The traversal technique is a little complicated because of paths. A default prefix traversal is good. Paths are a binary string (sequence) that represents a traversal pattern through the tree. 0000 would mean walk the tree by successively visiting 4 left-child nodes. 0100 means visit the left child, the right child, the left child, and a left child. A much better example:

A( B, C )

0 -> B
1 -> C

A ( B ( D, E ), C )

00 -> D
01 -> E

ct-clmsn avatar Feb 06 '18 17:02 ct-clmsn