lilac
lilac copied to clipboard
STRING_SPAN should be implemented as a tuple, not a dictionary.
Currently STRING_SPANs are implemented as {'start': 0, 'end': 10}.
The data format will not encode 'start' and 'end', but there is runtime overhead of emitting this dictionary. Since we may emit a lot of spans for some datasets (e.g. sentence splits), we should optimize this by using a tuple.
This should have no API effect.