horaedb icon indicating copy to clipboard operation
horaedb copied to clipboard

edit: datum struct string type added utf8 check

Open dust1 opened this issue 1 year ago • 5 comments

Rationale

Close #1300

Detailed Changes

Check whether it is a utf8 string when inserting data

Test Plan

pass

dust1 avatar Feb 26 '24 14:02 dust1

I forgot, I'll try adding a few more unit tests later

dust1 avatar Feb 27 '24 01:02 dust1

I checked the rust official documentation, and for the way to build Datum objects from String in datum.rs, rust guarantees that String is a utf8 string, which I might need to modify elsewhere. 😢

dust1 avatar Feb 27 '24 14:02 dust1

rust guarantees that String is a utf8 string, which I might need to modify elsewhere. 😢

Yes, I grep the code and find several place contains from_bytes_unchecked(bytes: Bytes).

As for debugging this issue, you can construct a GBK string using SDK, and trace why there is no error for it.

jiacai2050 avatar Mar 04 '24 02:03 jiacai2050

rust guarantees that String is a utf8 string, which I might need to modify elsewhere. 😢

Yes, I grep the code and find several place contains from_bytes_unchecked(bytes: Bytes).

As for debugging this issue, you can construct a GBK string using SDK, and trace why there is no error for it.

Ok, I'll try

dust1 avatar Mar 04 '24 14:03 dust1

The from_bytes_unchecked function will only be called when decoding. I think what I should be looking for is why non-UTF8 characters are saved when encoding. I'll find out later😵

dust1 avatar Mar 14 '24 14:03 dust1