mojo icon indicating copy to clipboard operation
mojo copied to clipboard

[Feature Request] Use unsigned bytes for String's buffer

Open lsh opened this issue 1 year ago • 2 comments

Review Mojo's priorities

What is your request?

Right now, String's data is stored as DynamicVector[Int8], but it should likely be DynamicVector[UInt8].

What is your motivation for this change?

Signed bytes tend to make users think semantically that they are working with numbers rather than raw data. It's also become increasingly popular to describe raw bytes as just a Vector of unsigned bytes (such as Uint8Array in JavaScript or []const u8 in Zig).

lsh avatar Jul 01 '23 04:07 lsh

Makes sense, although users should not be working directly with the bytes within a string :) Also, we try to match C semantics here which uses char * for strings

btw. there is a plan to perform optimizations on strings (e.g. small string optimizations), so you should never depend on its layout

abduld avatar Jul 01 '23 04:07 abduld

+1, the current implementation needs to be improved a bunch.

lattner avatar Jul 01 '23 17:07 lattner