cdec icon indicating copy to clipboard operation
cdec copied to clipboard

Make lattice format syntax a bit more flexible

Open nschneid opened this issue 11 years ago • 0 comments

The so-called Python Lattice Format (PLF) syntax supported by cdec is more constrained than I realized: unlike the equivalent Python data structure,

  • commas are required after all nodes and edges in the lattice, and
  • double-quoted strings like "'s" are not allowed.

Assuming a token with )) never occurs in the data, the first one is easily solved with a sed script. The second one took some Python hackery:

class SingleQuotedString(str):
    '''String whose __repr__() is always in single quotes'''
    def __repr__(self):
        return "'" + repr('"'+str(self))[2:]

...though it would be nice if cdec didn't choke on these.

nschneid avatar Dec 18 '13 03:12 nschneid