fury
fury copied to clipboard
[Go] Implement meta string encoding algorithm for golang
Is your feature request related to a problem? Please describe.
We've implemented meta string encoding algorithm in https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#meta-string for java in #1514 , it's time to implement it in golang.
Describe the solution you'd like
Java implementation in #1514 can be taken as a reference. But note that the meta string encoding algorithm is used for encode field name only, so the special charater can't be .
or $
, thus the implementation will be simpler
Additional context
#1413
Could you assign it to me? This is my first try of open source and I'm very interested in this task. Thanks.
Great, thanks for the willingness to contribute to Fury
In function public MetaString encode(String input, Encoding encoding)
in file MetaStringEncoder.java
, there is a section of code:
default:
byte[] bytes = input.getBytes(StandardCharsets.UTF_8);
return new MetaString(
input, Encoding.UTF_8, specialChar1, specialChar2, bytes, bytes.length * 8, 0);
why the numBits
is 0
, rather bytes.length * 8
?
why the numChars
is bytes.length * 8
, rather bytes.length
?
hmm, this is a bug, UTF-8 is barely used in meta string. Acutally, most chars are ascii chars. So it's not covered in Fury serialization tests. We need to fix it and add some unit tests.
Thanks for pointing out this bug @qingoba
I have a new idea, we can add a bit to incidate whether strip last char in encoded meta string if the encoding is not UTF-8. In this way, we don't have to store num bits and num chars in MetaString
Exactly.
Because 5 + 5 > 8
, in the last byte, there is at most one empty character.
Suppose we use empty
to mark whether last char is empty, then the actual number of characters is equal to len(bytes) * 8 / 5 - empty
In this way, the Decoder does not need to accept numBits
arguments.
I have a new idea, we can add a bit to incidate whether strip last char in encoded meta string if the encoding is not UTF-8. In this way, we don't have to store num bits and num chars in MetaString
Hi @qingoba , I added stip last char flag to spec in #1565 . I believe this will make the implementation simpler