cdec
cdec copied to clipboard
Make lattice format syntax a bit more flexible
The so-called Python Lattice Format (PLF) syntax supported by cdec is more constrained than I realized: unlike the equivalent Python data structure,
- commas are required after all nodes and edges in the lattice, and
- double-quoted strings like
"'s"
are not allowed.
Assuming a token with ))
never occurs in the data, the first one is easily solved with a sed script. The second one took some Python hackery:
class SingleQuotedString(str):
'''String whose __repr__() is always in single quotes'''
def __repr__(self):
return "'" + repr('"'+str(self))[2:]
...though it would be nice if cdec didn't choke on these.