[Java] Class registration by numeric id is not intuitive
Feature Request
When registering classes manually, the provided api method register(Class<?>, int id) implies to the user that any specified integer is valid. However:
- 0 means "Unregistered"
- Value must be unsigned
- There is a reserved range unknown to the user
- The actual valid user range is ~150 - 32766, and subject to change
- Precondition range check fails with no explanatory message
Is your feature request related to a problem? Please describe
The problem is usability, and possible breaking of compatibility issues with user ranges in future versions.
Describe the solution you'd like
Possible solutions:
- Reserve the first, say 1000 ids for internal classes
- Allow the user range to be say 0-20000 (or up to 31765, assuming 0 and Short.MAX_VALUE can't be used, and 1000 ids are reserved)
- Go back to the
register(Class<?>, short id)signature - Internally, compute the class id to 1000 + user supplied id, for any calls to
register(Class<?>, short id) - If precondition fails the range, report an intuitive error message: "Supplied class id 21234 is outside valid range of 0 to 20000"
- never expose internal range semantics (0=Unregistered) or potential id conflicts to user
Or
- Maintain the existing scheme
- Fix precondition message to be more intuitive and explanatory, and docs as well
- Expose methods to obtain
minClassIdandmaxClassIdand perhapsisValidClassId(int),idForClass(Class<?>),isReservedClassId(int)etc... to give the user visibility into id assignment internals as a basis for constructing their own ids and offsets - auto id methods
register(Class<?>)andregister(Class<?>, boolean)should returnintof the auto-assigned id, so user can increment from there
Or
- Allow user to override all class registrations, even internal ones
- Provide api methods to retrieve the existing mappings
- Go back to
register(Class<?>, short)method signature - Allow the use of any short value as id
Or
- Amend serialization protocol to distinguish between internal and user ids (such as with lsb of the id short, or a separate bit flag)
- Modify the api to
register(Class<?>, short), accepting all possibleshortvalues, and assume any id set by this method is a user id - Internally, modify class lookups to consider the two types of ranges
Or
- Rethink the namespace concept to allow a fixed number of namespaces, say 256 (to fit into a
byte) - Use a namespace for every registration, user and internal
- 0 = internal namespace, where jdk classes are registered, never exposed to the user
- Assume any user class registration belongs to a user namespace
- Use an enum such as
Namespace.USER_1, with 255 possible values in the api methods, or a factoryNamespace.of(String name)that hashes the user supplied namespace into one of the pre-defined slots - Sample encoding: use 1 byte to represent namespace, followed by either a short class id or string id, or use lsb of namespace byte to indicate if what follows is a name or id, and restrict Namespace to 128 user slots
- Alternative encoding: Allow 64 user namespaces. 1st bit indicates if internal or user namespace. If internal, follow by an unsigned short. So internal id fits into existing 16 bits. If user, follow by a bit indicating if it's a type name or id, followed by the 6 bit namespace id, follow with either a compressed string type name or compressed int id
- modify the api methods to something like:
short register(Class<?> cls, boolean createSerializer);- Auto registers class inNamespace.USER_1and return its numeric id (starting with 0), or perhaps return a tuple of Namespace and id instead of just idvoid register(Class<?> cls, short id, boolean createSerializer);- Register in USER_1void register(Namespace ns, Class<?> cls, short id, boolean createSerializer);void register(Class<?> cls, String typeName, boolean createSerializer);- Register in USER_1void register(Namespace ns, Class<?> cls, String typeName, boolean createSerializer);
Or
- Use an alternative encoding internally for out-of-range values with existing scheme
- Don't amend the protocol and keep
register(Class<?>, int)signature, allow anyintto be used as a class id - Any value outside of accepted range results in the class being registered by name instead, in some "internal" namespace (or just
""), using say Base62/85 encoding - Valid (in-range) ids are either represented as
short, or as base-encoded strings, depending on which result is smaller. Signed int values are normalized to unsigned before encoding - Make this functionality opt-in via fory builder, not by default
- This scheme will allow the user to use their application-specific ids fully. In the worst case, assuming namespace = "" and Base62 encoding is used, the payload would only bloat by 6 bytes between the maximum short value and max int value
- Docs would explain that using values say 200-32766 is more space-efficient, though not required, or perhaps api would emit a
log.warn()message
Hi @drse, I have read your proposal carefully. In general, it is very comprehensive. The following is my design proposal.
Class Registration API Improvement Proposal
I. Current State Analysis
1.1 Current Constraints
In the current implementation:
- Validation rule:
classId >= 0 && classId < Short.MAX_VALUE(i.e., 0-32766) - Reserved ID 0:
NO_CLASS_ID = 0indicates unregistered - Internal reserved range: IDs 1-36+ are used for internal types (primitives, wrappers, common collections, etc.)
- Internal end marker: The
innerEndClassIdfield records the last ID of internal class registration (approximately around 150)
1.2 Cross-Language Specification Constraints
The cross-language serialization specification explicitly states:
- Internal data types use range 0-64
- Users can use 0-4096 to represent their types
1.3 Root Cause
The current checkRegistration method's error messages are not clear enough, failing to explain:
- Which ID ranges are reserved
- Why certain IDs are unavailable
- The recommended ID range to use
II. Recommended Solution (Hybrid Approach 1 + 2)
After evaluating all proposed solutions, I recommend a hybrid approach combining the advantages of solutions 1 and 2:
2.1 Core Design Principles
- Backward Compatibility: Don't break existing APIs and serialization protocols
- Progressive Enhancement: Improve documentation and validation first, then consider API evolution
- Clear Boundaries: Clearly define user ID ranges
- Friendly Prompts: Provide detailed error messages and warnings
2.2 Implementation Steps
Phase One: Immediate Improvements (No Breaking Changes)
Step 1: Define Constants and Ranges
Add to ClassResolver:
public class ClassResolver implements TypeResolver {
// User-visible constants
public static final short MIN_USER_CLASS_ID = 200;
public static final short MAX_USER_CLASS_ID = 32766; // Short.MAX_VALUE - 1
public static final short RESERVED_ID_START = 0;
public static final short RESERVED_ID_END = 199; // Reserve 200 IDs
// Internal use
private static final String RESERVED_RANGE_MSG =
"Class ID %d is in the reserved range [%d, %d]. " +
"Please use IDs in the range [%d, %d] for user classes.";
private static final String OUT_OF_RANGE_MSG =
"Class ID %d is out of valid range. " +
"Valid range for user classes is [%d, %d].";
}
Step 2: Improve Validation Logic 4
Modify the register(Class<?> cls, int classId) method:
public void register(Class<?> cls, int classId) {
// Improved validation
Preconditions.checkArgument(classId >= 0 && classId < Short.MAX_VALUE,
"Class ID must be in range [0, %d), got: %d", Short.MAX_VALUE, classId);
short id = (short) classId;
// Check if in reserved range
if (id < MIN_USER_CLASS_ID) {
LOG.warn(String.format(RESERVED_RANGE_MSG,
id, RESERVED_ID_START, RESERVED_ID_END, MIN_USER_CLASS_ID, MAX_USER_CLASS_ID));
}
checkRegistration(cls, id, cls.getName());
// ... rest of code remains unchanged
}
Step 3: Add Helper Methods
// New public API methods
public static short getMinUserClassId() {
return MIN_USER_CLASS_ID;
}
public static short getMaxUserClassId() {
return MAX_USER_CLASS_ID;
}
public static boolean isValidUserClassId(int classId) {
return classId >= MIN_USER_CLASS_ID && classId <= MAX_USER_CLASS_ID;
}
public static boolean isReservedClassId(int classId) {
return classId >= RESERVED_ID_START && classId < MIN_USER_CLASS_ID;
}
// Improved auto-registration that returns the assigned ID
public short registerAndGetId(Class<?> cls) {
if (!extRegistry.registeredClassIdMap.containsKey(cls)) {
while (extRegistry.classIdGenerator < registeredId2ClassInfo.length
&& registeredId2ClassInfo[extRegistry.classIdGenerator] != null) {
extRegistry.classIdGenerator++;
}
register(cls, extRegistry.classIdGenerator);
return extRegistry.classIdGenerator;
}
return extRegistry.registeredClassIdMap.get(cls);
}
Step 4: Improve Documentation
Update the BaseFory interface documentation:
/**
* Register class with specified id.
*
* <p><b>Important:</b> Class IDs in the range [0, 199] are reserved for internal use.
* User classes should use IDs in the range [200, 32766].
*
* <p>The method will emit a warning if you register a class with a reserved ID,
* but it will not fail to maintain backward compatibility.
*
* <p>Use {@link #getMinUserClassId()} and {@link #getMaxUserClassId()} to get
* the recommended ID range.
*
* @param cls class to register
* @param id class ID, recommended range: [200, 32766]
* @throws IllegalArgumentException if id is negative or >= 32767
*/
void register(Class<?> cls, int id);
Phase Two: API Evolution (Next Major Version)
Option A: Add Type-Safe API
// New API using short type for explicit constraints
public void registerUser(Class<?> cls, short userId) {
Preconditions.checkArgument(
userId >= 0 && userId <= (MAX_USER_CLASS_ID - MIN_USER_CLASS_ID),
"User ID must be in range [0, %d], got: %d",
MAX_USER_CLASS_ID - MIN_USER_CLASS_ID, userId);
int internalId = MIN_USER_CLASS_ID + userId;
register(cls, internalId);
}
Option B: Support Namespaces (If Needed for Cross-Language Scenarios)
Fory already supports name-based registration, which can be enhanced rather than introducing a new namespace concept.
2.3 Test Plan
Test cases to add:
-
Boundary Tests:
- Test ID 0 (should warn)
- Test ID 199 (should warn)
- Test ID 200 (should succeed)
- Test ID 32766 (should succeed)
- Test ID 32767 (should fail)
-
Error Message Tests:
- Verify error messages contain recommended ID ranges
- Verify warning logs are correctly recorded
-
Compatibility Tests:
- Ensure existing code using reserved IDs still works
- Ensure serialization/deserialization is compatible with old versions
2.4 Migration Guide
For existing users:
// Old code (still works, but with warnings)
fory.register(MyClass.class, 100);
// Recommended new code
fory.register(MyClass.class, 200); // or larger value
// Or use helper methods
if (ClassResolver.isValidUserClassId(myId)) {
fory.register(MyClass.class, myId);
}
III. Evaluation of Other Solutions
3.1 Solution 3 (Full Openness) - Not Recommended
Allowing users to override all internal registrations would break system stability and could cause core serialization functionality to fail.
3.2 Solution 4 (Protocol Modification) - Not Recommended
Modifying the serialization protocol would break cross-language compatibility at too high a cost.
3.3 Solution 5 (Namespaces) - Not Recommended for Now
Introducing a complex namespace system would significantly increase API complexity. The current name-based registration mechanism is sufficient.
3.4 Solution 6 (Automatic Encoding) - Consider for Future Enhancement
Using Base62/85 encoding to handle out-of-range values is an interesting idea, but would increase serialization overhead. It could be considered as an opt-in feature in future versions.
IV. Implementation Priority
P0 (Immediate Implementation):
- Add constant definitions (MIN_USER_CLASS_ID, MAX_USER_CLASS_ID)
- Improve error messages
- Add warning logs
- Update documentation
P1 (Next Minor Version):
- Add helper methods (isValidUserClassId, isReservedClassId)
- Add registerAndGetId method
- Complete test cases
P2 (Next Major Version):
- Consider adding registerUser API
- Possibly deprecate direct use of reserved ranges
Notes
Key Decision Rationale:
-
Why choose 200 as the starting point: Current estimates show approximately 150 internal registrations, reserving up to 200 provides sufficient buffer.
-
Why not modify the protocol: The cross-language serialization protocol is already stable, modification costs are extremely high and would affect Java, Python, Go, Rust, C++ and other language implementations.
-
Backward compatibility: Existing tests use IDs like 300-302, and this code must continue to work.
-
Progressive improvement principle: First improve usability through documentation, warnings, and helper methods, then introduce API changes in major versions to avoid breaking changes.
Hi @chaokunyang,What are your thoughts on this?