lilac icon indicating copy to clipboard operation
lilac copied to clipboard

STRING_SPAN should be implemented as a tuple, not a dictionary.

Open nsthorat opened this issue 1 year ago • 0 comments

Currently STRING_SPANs are implemented as {'start': 0, 'end': 10}.

The data format will not encode 'start' and 'end', but there is runtime overhead of emitting this dictionary. Since we may emit a lot of spans for some datasets (e.g. sentence splits), we should optimize this by using a tuple.

This should have no API effect.

nsthorat avatar Apr 26 '23 14:04 nsthorat