fury icon indicating copy to clipboard operation
fury copied to clipboard

refactor(Rust&Java): use varint32 as type id for int32

Open urlyy opened this issue 1 month ago • 6 comments

What does this PR do?

  1. Added a type_resolver parameter to fory_static_type_id() and fory_reserved_space() so that type_resolver.is_compress_int() can be used to decide whether to use read/write_varint32() or read/write_i32().
  2. Added a compress_int() method to Fory to allow users to configure whether integers should be compressed; default is true.
  3. Updated the logic in impl Serializer for i32.

Related issues

#2885 #2884

Does this PR introduce any user-facing change?

Users can configure whether to compress integers by calling fory.compress_int(true/false).

urlyy avatar Nov 05 '25 11:11 urlyy

@chaokunyang Is ForyJava hasn’t updated the writeClassInfo logic for Integer based on compressInt() yet? I may need to temporarily disable RustXlangTest.java.

urlyy avatar Nov 05 '25 11:11 urlyy

@chaokunyang Is ForyJava hasn’t updated the writeClassInfo logic for Integer based on compressInt() yet? I may need to temporarily disable RustXlangTest.java.

Could we update java implementation in this PR too?

chaokunyang avatar Nov 05 '25 11:11 chaokunyang

Could we update java implementation in this PR too?

I will try.

urlyy avatar Nov 05 '25 12:11 urlyy

  1. Need to refactor fory-derive/object/util.rs::compute_struct_version_hash(). The typeid for i32 is treated as INT32 at compile time, which causes a version hash mismatch error.
  2. CrossLanguageTest.java also occurs errors and the Python implementation needs to be modified as well.

But I don’t have time to refactor these. If it's not urgent, I will continue working on this PR in four or five days.

urlyy avatar Nov 05 '25 13:11 urlyy

But I don’t have time to refactor these. If it's not urgent, I will continue working on this PR in four or five days.

@urlyy It's not urgent, you can take your time

chaokunyang avatar Nov 06 '25 06:11 chaokunyang

@chaokunyang Can we modify the format of version_hash?

The reason is that Rust needs to compute the version_hash at compile time for different compression scenarios, which requires calculating $2^5$ hashes (because of not_compress_i32 / compress_i32 / not_compress_i64 / compress_to_var_i64/ compress_to_sli_i64). If we further add U32 / VAR_U32 / U64 / VAR_U64 / SLI_U64, it becomes even more explosive.

Separating the two compression flags can effectively mitigate this situation.

Which means, for int_32 and int_64, when computing the hash, we only treat them as Types::INT32 and Types::INT64. And use 1 bit of the hash value to represent compress_int_flag, and another 2 bits to represent compress_long_encoding. For u32 and u64, 3 more bits are needed. What do you think of this approach?

urlyy avatar Nov 15 '25 01:11 urlyy

  1. Currently, ForyRust requires users to use #[fory(compress_int)] or #[fory(compress_int=true/false)] on a type to set whether compress_int is enabled for that type. ForyRust will directly compute the version_hash at compile time based on the type’s corresponding compress_int.

  2. If the parameter macro is not used, compress_int defaults to true.

  3. When registering, ForyRust sets the TypeResolver’s compress_int value based on the is_compress_int() return value of the registered type. Therefore, if there is a conflict, it will panic immediately. This requires that all types registered under a single Fory object must have the same compress_int value.

  4. As mentioned above, I added a new method is_compress_int() to StructSerializer.

  5. If no type has been registered yet, users can manually set compress_int() by calling fory.compress_int(bool). However, if a type with compress_int=true/false has already been registered, setting fory.compress_int(false/true) will immediately panic.

  6. For EXT, there’s no need to worry. If some struct types are already registered, EXT’s compress_int will match the other registered types. If not, it can still be manually set via fory.compress_int(bool).

  7. Since the default value of compress_int is true, it can be cumbersome for users who want to set it to false for everything. Therefore, I submitted #2916 to help alleviate this situation.

urlyy avatar Nov 15 '25 20:11 urlyy

Still needs to address #2886 #2887, you can first review my approach and code.

urlyy avatar Nov 15 '25 20:11 urlyy

Currently, the behavior of int::fory_read_data() is determined by the local fory's is_compress_int(). Do we need to change it so that Serializer#fory_read_data(remote_type_id) performs different read operations based on the type_id that was read?

urlyy avatar Nov 17 '25 14:11 urlyy

Could we remove compress_int API. The xlang is different pure java. In pure java, we only write code once to create fory. Now in xlang mode, we crate fory in every language, adding too much options is error-prone to introduce inconsistency. And rust don't support generate code dynamically, such option also introduce runtime cost.

I prefer a macro attr:

struct Foo {
  [#fory(compress=False)]
  f1: i32
}

For java, we can create an annotation:

class Foo {
  @ForyField(compress=false)
  int f1:
}

For python, we can create a field method like dataclass decorator.

We can support #fory(compress=False) for pure rust in this pr, so we don't have to support java annotation or python field in this pr

chaokunyang avatar Nov 22 '25 02:11 chaokunyang

We can support #fory(compress=False) for pure rust in this pr, so we don't have to support java annotation or python field in this pr

OK, I will implement the compress_int annotation on member variable for each language one by one in separate PRs, and then open another PR at the end for cross-language testing.

urlyy avatar Nov 22 '25 02:11 urlyy