fury icon indicating copy to clipboard operation
fury copied to clipboard

fix(go): adjustments to the sliceSerializer

Open lzaeh opened this issue 2 months ago • 7 comments

Why?

Previously, my understanding of the type system was insufficient, which led to confusion between array and slice usage. As a result, the deserialization step of the slice serializer did not work properly in cross-language scenarios due to missing alignment rules (for example, when the types were structurally identical and should have taken the fast path, the previous version failed to correctly interpret the byte stream).

What does this PR do?

Serializer-related changes:

  • Adjusted the slice serializer to better align serialization and deserialization rules.
  • Standardized the type system to ensure that the slice serializer is consistently used where appropriate, eliminating mixed usage with array serializers.
  • Moved the serializer for primitive type arrays into primitive_array.go. If you think it’s not appropriate, I can revert the change.

Test-related changes:

  • Re-enabled previously commented-out test cases in commonSlice/Array.
  • Moved the [n]String test out of commonArray. Although it can be correctly deserialized into []String, the current test logic does not treat [n]String and []String as equivalent.
  • Updated several primitive array tests in TestXLangSerializer to use array types instead of slices.
  • Modified the check function in TestXLangSerializer to deserialize using the precise reflected type instead of interface{}. Rationale:
    • For example, []interface{}{"a", "b", "c"} and []string{"a", "b", "c"} are considered the same type and written via the fast path, resulting in identical byte sequences.
    • However, during deserialization in Go, the default behavior is to use []interface{} unless the target type is explicitly provided.(Just my opinion)
    • This means that if the test expects []string{}, it would fail. While we could force construction of a []T{} when the same type is detected, if the test’s starting point is []interface{}, it would still fail. Therefore, the deserialization target type is now explicitly matched via reflection.

Known Issues

  • After pulling the latest code (which includes modifications related to field order hashing in structs), the complex struct tests that passed 10 mins ago now fail due to Python-side deserialization errors.

    • A. Direct testing shows identical hashes, but deserialization in Python triggers an “invalid ID (id = 0)” error in the listSerializer—possibly related to changes in this PR.
    • B. Simplifying the struct on both the Python and Go sides (by removing fields) caused the hash values to differ. I will follow up on this issue to determine whether the cause lies in the list serializer or incomplete handling of struct field order consistency.
  • Cross-language list serialization tests (serializing in one language, deserializing in another—Python ↔ Go, Java ↔ Go) revealed that Go’s int and float64 types cannot be correctly deserialized in the other languages.

    • All other basic types work fine, and Go can correctly deserialize bytes from both Python and Java.
    • When encountering floating-point values, Go must use float32 to deserialize successfully.
  • The slice serializer currently mimics Python’s fast-path logic, but the matching rules for integer and floating-point types are somewhat ambiguous. My understanding here may be incomplete, and this part likely needs further refinement.

Related issues

  • #2633

Does this PR introduce any user-facing change?

  • [ ] Does this PR introduce any public API change?
  • [ ] Does this PR introduce any binary protocol compatibility change?

Benchmark

lzaeh avatar Oct 13 '25 10:10 lzaeh

@lzaeh could you merge main branch first? there are some conflict

chaokunyang avatar Oct 14 '25 15:10 chaokunyang

@lzaeh could you merge main branch first? there are some conflict

i will try asap tomorrow, sorry for the delay.

lzaeh avatar Oct 15 '25 16:10 lzaeh

Scope of this change

This change removes array support only along the usage path, reflected in:

  1. getTypeInfo behavior: except for []byte (which still uses its concrete type to enable zero-copy, out-of-band storage optimizations, etc.), all other []T are treated as lists.

  2. Reflect checks: code paths that handled reflect.Array have been removed. Since arrays aren’t supported, they’re considered unavailable to users.

  3. User-supplied arrays: if a user forces an array, per (1) the system reports an error indicating [N]T is not supported.

What I did not change

I’m unsure whether we should purge array-related constructs from the type system itself. Fully removing them would touch a lot of code. Keeping the definitions around but unused might serve as a transitional/contingency option, even if arrays are effectively deprecated right now. That said, if we do want to remove arrays from the type system entirely, let me know and I’ll follow up.

Cross-language test adjustments

For TestXLangSerializer: since other languages may interact with the same unit as Python, I removed the array-related code paths on both the Go and Python sides within this test. I then added a separate test_cross_language_serializer_go case in Python’s cross-language tests so that Go still covers this part. If this setup isn’t appropriate, please let me know.

Remaining work (Relatively higher-priority tasks, Just my opinion)

Update the struct field ordering logic accordingly. After updating the Python module, two struct tests failed due to hash mismatches. I haven’t changed the handling of complex structs, which means the code path for fields that are arrays within complex structs remains untouched—and this test includes arrays. If I tweak just one spot now, it’ll be hard to localize the issue. It’s better to make the corresponding changes after I spend time learning the new sorting/ordering optimization algorithm, so we can update everything in sync.

@chaokunyang

lzaeh avatar Oct 22 '25 12:10 lzaeh

Update the struct field ordering logic accordingly. After updating the Python module, two struct tests failed due to hash mismatches. I haven’t changed the handling of complex structs, which means the code path for fields that are arrays within complex structs remains untouched—and this test includes arrays. If I tweak just one spot now, it’ll be hard to localize the issue. It’s better to make the corresponding changes after I spend time learning the new sorting/ordering optimization algorithm, so we can update everything in sync.

Recently, we have updated the hash calculation algorithm. Maybe this PR(#2814) can solve this problem

pandalee99 avatar Oct 23 '25 17:10 pandalee99

Could you refer to both java and rust implementation? In fory rust, both array and vec are supported, they can all be serialized as a list, and deserialize a list into target type. We should do similiar thing in go too.

For fory rust, you can refer to list.rs and array.rs.

chaokunyang avatar Nov 11 '25 15:11 chaokunyang

Could you refer to both java and rust implementation? In fory rust, both array and vec are supported, they can all be serialized as a list, and deserialize a list into target type. We should do similiar thing in go too.

For fory rust, you can refer to list.rs and array.rs.

Sorry—I was tied up with some real-life errands and didn’t see this message in time.

I’ve implemented a version along the lines you suggested by removing the array serializers. Details:

(1) Primitive arrays: unless the user provides a specific destination type (only array or slice are considered valid), deserialization checks the type and length and deserializes accordingly; otherwise (e.g., when the target is interface{}) we always deserialize to a slice. This is reflected in some local tests (still supporting array as a receiver, and when the receiver is interface{} we convert and compare the produced slice against the expected array), as well as in the cross-language test(TestXLangSerializer) where arrays are deserialized as slices.

(2) I think the support in (1) is still necessary. While we don’t recommend arrays to users, if a struct field is defined as an array (rather than interface{}), deserialization will work fine because of (1).

(3) One confusing point: the complex-struct tests had been failing for a while. For various reasons I kept pulling the latest code and reinstalling pyfory. I did the same this time, and before I had a chance to dig into the complex-struct issue, the tests started passing again.

I only opened GitHub after finishing this part of the code—bad workflow on my side T_T . I’ll follow your suggestions going forward. Thanks!

lzaeh avatar Nov 14 '25 14:11 lzaeh

Some files only appeared as modified when I synced to the new version—I didn’t actually change them. They won't be treated as new code during the merge.

I’ll keep studying those two files and figure out suitable approach for Go. This version is only based on earlier discussion and hasn’t yet incorporated learnings from the other language implementations. :(

lzaeh avatar Nov 14 '25 14:11 lzaeh