microstream
microstream copied to clipboard
Kotlin: create Binary Handler for Immutable Collection
There are two types of collections in Kotlin, mutable and immutable. The mutable can be handle by out existing Binary Handlers, but we need to support for immutable.
Docs for Kotlin Collections: https://kotlinlang.org/docs/reference/collections-overview.html
fun main() {
var buffer = mapOf("key1" to 1, "key2" to 2, "key3" to 3, "key4" to 4)
var copy = mutableMapOf<Integer, String>(); //works
//var copy = mapOf<Integer, String>(); //does not work
var storage = EmbeddedStorage.start(buffer);
storage.shutdown();
storage = EmbeddedStorage.start(copy);
exitProcess(0);
}
I have prepared sample maven project to implement these handlers easier kotlin-test.tar.gz
In addition to this, Kotlin object
s (singletons) are duplicated upon loading, causing issues for comparisons with sentinel objects (I just ran into this with by lazy{}
delegated properties that otherwise work fine).
I'm going to start a PR creating a new "persistence" submodule similar to the existing binary-jdk8
and binary-jdk17
, but with a Kotlin dependency and see if I can get something working.
I've been investigating and I can't seem to cause any issues with Kotlin's immutable collections. Kotlin delegates to the standard Java collections in all cases except totally empty collections (which are cached). In fact, when using Java reflection operations on objects or fields using the kotlin.collections.List
interface, both List
and MutableList
appear as java.util.List
and everything works fine.
The issue I see now is with various singleton objects (in Kotlin as object
declarations, in Java as static final
fields excluding enum
s). When "fresh" they compare identical (==
in Java, ===
in Kotlin), but when reloaded they do not compare identical or even necessarily equal! In some cases the new objects will luckily still compare equal and don't cause issues in JDK classes because it never compares by identity, but it does end up being a major issue in Kotlin since object
s are designed to be compared by identity (Kotlin's "switch statement", when(...) { ... }
, uses identity to compare object
s, IntelliJ IDEA warns against implementing equals
, etc.).
This issue of knowing when a class should be persisted in name only (as a singleton) isn't solvable in Java in general as static final
fields could be anywhere and have no guarantees, but it is solvable specifically for Java enum
s and Kotlin object
s since you can look up the canonical representation of the singleton from just its class and name using Class.getEnumConstants()
and KClass.getObjectInstance()
.
After looking into the existing enum
handling, I'm not sure what the best course of action is for Kotlin object
s. From reading createTypeHandlerEnum
, isHandleableEnumField
, and createEnumHandler
, it seems like the current philosophy for persisting enum
s is to, save all fields and when loading, take the current canonical enum instance, validate its ordinal hasn't changed, and then set all its fields to the newly loaded fields (modifying the existing enum instance). This is contrary to the Java Serialization specification (and other serializers I've used like Kryo and Jackson) which states:
Enum constants are serialized differently than ordinary serializable or externalizable objects. The serialized form of an enum constant consists solely of its name; field values of the constant are not present in the form. ... The process by which enum constants are serialized cannot be customized: any class-specific writeObject, readObject, readObjectNoData, writeReplace, and readResolve methods defined by enum types are ignored during serialization and deserialization. Similarly, any serialPersistentFields or serialVersionUID field declarations are also ignored--all enum types have a fixed serialVersionUID of 0L. Documenting serializable fields and data for enum types is unnecessary, since there is no variation in the type of data sent.
I personally don't understand the use of the current enum
persistence strategy. It means that loading an enum
modifies the canonical enum
instance, changing it for everybody. This could only happen if the enum
instance was modifiable in the first place though, which, while legal, has no obvious uses to me.
What was the original purpose of saving all enum fields, and is it still needed by default? Looking at the commit history, I can see enum
serialization in its current form was implemented around https://github.com/microstream-one/microstream/commit/837e2529e6e24ea60e8c5fbf284d8af3eda97286, after being disabled in https://github.com/microstream-one/microstream/commit/466f3d9497a2a9dfb2a618aeda2f3486b8389248 by @tm-ms. Do the concerns voiced here: https://github.com/microstream-one/microstream/blob/15e26334bbcf8b23538a2d27f91c84becf85537f/persistence.binary/src/one/microstream/java/BinaryHandlerEnum.java#L33-L40 still apply with today's API?
A similar treatment of Kotlin object
s would also be contrary to the Kotlin Serialization specification, which states:
An
object
serializes as an empty class, also using its fully-qualified class name as type by default: ... Even if object has properties, they are not serialized.
so I don't think the current enum
persistence strategy should be used for Kotlin object
s.
For Kotlin's enum class
es (compiled to normal Java enum
s) this section applies:
In JSON an enum gets encoded as a string.
// The @Serializable annotation is not needed for enum classes enum class Status { SUPPORTED }
{"name":"kotlinx.serialization","status":"SUPPORTED"}
Ultimately, I think having Java enum
s and Kotlin object
s treated differently would be extremely confusing to users. I also think the current treatment of enum
s is confusing, so I would prefer both enum
s and object
s to adhere to the language specs regarding serialization, i.e. be strictly name-based, looking up and returning the "canonical" instance without modification when loaded. This section from the Kotlin Serialization specification sums up my thoughts:
Conceptually, a singleton is a class with only one instance, meaning that state does not define the object, but the object defines its state.
It's simply doesn't make sense to load an "old" singleton in a "new" application; its old data depended on its old type, which no longer exists, so we should substitute the next best thing: its new type along with its new data.
What are your thoughts on deprecating the current storage of enum
fields (replacing it with class+name storage) and adding detection of Kotlinobject
s to be stored in that same way?
(detection of Scala object
s could also be done, using the same singleton technique)