InMemoryDatasets.jl icon indicating copy to clipboard operation
InMemoryDatasets.jl copied to clipboard

Enhance `Characters` type

Open sl-solution opened this issue 2 years ago • 0 comments

This is to track the issue with string in Julia

String in Julia is not suitable for InMemoryDatasets, and Characters (UInt8 - UInt16) currently is good only for strings up to 15 characters (compiling time issue with NTuples). InlineStrings are restricted and wasteful. The idea is to have something similar to Characters but instead of having NTuple we should have a vector of UInt8 with an attribute of length which fixes the length of each element, so vector of Characters{8} with 10 elements should be a vector of 10*8 UInt8 plus an attribute of 8 which indicates each string in this vector is of length 8. For shorter strings they should be padded by space and for longer one they should be truncated to 8 characters or less if they are UTF.

sl-solution avatar Oct 21 '21 22:10 sl-solution