swift-models
swift-models copied to clipboard
[WordSeg] Use `Character` instead of `String` in `Alphabet`
Currently, Alphabet's dictionary maps from String rather than Character to support tokens of length > 1 character. Using Character instead of String would work if we used special Unicode characters or enums instead of "</s>", "</w>", and "<pad>".
Since this is used in so many places in the WordSeg model, it is potentially worthwhile to make it more efficient.