KEEP icon indicating copy to clipboard operation
KEEP copied to clipboard

Data objects

Open elizarov opened this issue 1 year ago • 16 comments

This issue is for discussion of the proposal to add data object. The full text of the proposal is in this here.

elizarov avatar Aug 08 '22 08:08 elizarov

can't all objects be data objects by default? It would be nice to have that string representation even for normal objects

xxfast avatar Oct 04 '22 19:10 xxfast

Using the data keyword for object declarations seems inconsistent and misleading because the behavior differs from our mental model of data classes.

Inconsistencies:

  1. The copy method is not generated since it doesn't make sense for singletons
  2. The componentN methods are not generated
  3. The equals method is not generated
  4. The generated toString method doesn't provide the same type of utility that we get from data classes (see explanation below).
  5. The hashCode method is not generated

Inconsistencies 2, 4, & 5 seem obvious at first but note that 99% of all objects have properties so adding the data keyword would make most developers at the very least consider whether the keyword might bring this behavior of using these properties when adding the object into a hashing collection (eg. if the object extends some common base class) or in the generated toString or when attempting to destructure otherwise what's the point of adding a new keyword to objects if it lacks most of the behavior that developers associate with data classes.

I understand that data classes only reference constructor properties but object declarations can never have this distinction as all of their properties must be defined inside the class. So the data keyword hints at bringing these types of capabilities that data classes have and this creates confusion and invalid expectations since the data keyword here does not provide the vast majority of behavior that we have come to expect from data classes.

daniel-rusu avatar Oct 06 '22 16:10 daniel-rusu

3 & 5 are not really relevant as the generated version would be functionally equivalent to the default implementation. So their omission is actually a code size optimization with no behavioral difference. The functions are defined on all types so the presence of overrides is an implementation detail that no one needs to care about.

You could also argue that 2 does occur. A data object generates the correct amount of componentN() functions for the number of primary properties it wraps: zero.

That same argument applies to 4. The toString() implementation behaves the same way as a data class: it shows the type name and the values for all primary properties (whose count is zero).

So the only real difference is 1 and its omission is noted in the KEEP text.

note that 99% of all objects have properties

[citation needed]

I use objects to implement interfaces with stateless behavior and for marker types in sealed hierarchies. I would say 99% of my own objects never have properties.

JakeWharton avatar Oct 06 '22 16:10 JakeWharton

I am sorry data object is very confusing to me. I had a very hard time mapping my brain that an object is a static singleton (which in itself a contrary to other languages where objects are referred to as dynamic). And now, data object is even more confusing.

Instead, how about this approach we automatically determine whether its a data object or not, based on the return type?

data Message(String to, String from, String message)
object message = Message("[email protected]", "[email protected]", "Hello") // This now acts as "data object" as the return type is a data class.


User(String username, String password)
object user = User("KoltinUser", "S3CR3T")  // Now this acts as a regular object.

Disclaimer - The proposed solution is based on my understanding of objects if I got it completely wrong or if it does not address other scenarios. Please ignore my proposed solution.

dinbtechit avatar Oct 06 '22 17:10 dinbtechit

3 & 5 are not really relevant as the generated version would be functionally equivalent to the default implementation...

You could also argue that 2 does occur. A data object generates the correct amount of componentN() functions...

That same argument applies to 4. The toString() implementation behaves the same way as a data class...

You are correct from an implementation perspective if you base your decision on the underlying implementation details of data classes. However, this perspective doesn't seem aligned with the higher-level concept of data classes.

Instead of evaluating this solution of data objects by implementation details of data classes, I want to take a step back and evaluate the higher-level concept to ensure consistency and then use that to guide implementation details rather than the reverse approach. I also want to evaluate the solution based on core engineering principles of abstraction so that names aren't misleading but instead they are consistent and provide an accurate description of what they're supposed to represent.

From a concept perspective, a data class represents a container of data. The implementation details of data classes serve to meet the purpose of data classes rather than to define their purpose. For example, the implementation detail of excluding non-constructor properties from any of the auto-generated methods allows us to make the distinction of which fields should be considered to be part of the core data of the wrapper and thus only use those when checking for equality etc. therefore this shouldn't be used as part of the definition of what it means to be a data class since this is just a useful utility which makes data classes even more useful.

As a test for consistency, let's take an outside perspective from a hypothetical experienced developer that isn't familiar with Kotlin. Suppose that we explain the concept of data classes and that the object declaration creates a singleton. If we then ask the engineer what they would expect a data object to be then the most natural assumption for them would be to think that a data object is a singleton wrapper of data (eg. perhaps a singleton with related constants etc.). Now continuing with this expected interpretation and given that you can never declare properties in the object constructor, this engineer would expect the class properties to be the "data" of this data object. The concept of having the data keyword generate a fairly trivial single line of code seems to have too little value so this would further make engineers wonder if something more is taking place by having a data object.

This previous paragraph shows how points 2, 3, 4, and 5 are relevant based on the expected interpretation of someone that is not intimately aware of the underlying Kotlin implementation details and instead thinks in terms of the concepts that Kotlin provides.

For reference, the single line of code that I was referring to on the JVM is this: override fun toString() = this::class.java.simpleName

Regarding exactly what percentage of object declarations have class properties, citations are probably non-existent and depend more on programming style. However, I gotta admit that my previous estimation was too hasty and high so instead of trying to put a number to it, here are some common use cases where I have properties in object declarations:

  • I like to group related "constants" in an object declaration to avoid polluting the namespace and make locating related constants easier.
  • Sometimes a utility function requires lots of up-front setup which can then be re-used. A recent example is a parse function which needs to setup the tokanization and lexical analysis data structures. I like to extract these types of functions in an object with the heavy but re-usable initialization as a lazy delegate property so that we only pay the price once and only if needed.
  • The majority of my object declarations that are part of a sealed-class hierarchy also contain properties.

I'm sure I could add more scenarios but I think you get the idea that having val properties in an object declaration is not a rare occurrence by any means.

Lastly, the amount of value added compared to the increased complexity and confusion that it introduces seems to move Kotlin in the direction of Scala especially since it breaks the clean meaning of the data keyword.

daniel-rusu avatar Oct 07 '22 01:10 daniel-rusu

can't all objects be data objects by default? It would be nice to have that string representation even for normal objects

Agree 😂

hoc081098 avatar Oct 07 '22 06:10 hoc081098

There's rare situation where one would like to see the default implementation of toString on objects. So personally making this default is better than adding more complexity to language itself.

Peanuuutz avatar Oct 09 '22 03:10 Peanuuutz

I think someone mentioned this is done so that it is backward compatible if someone was relying on the existing string representation for whatever reason

If you are relying on a string representation like that - and if your app breaks on you - that is on you to fix. No need to keep luggage like this - this is not Javascript 😄

xxfast avatar Oct 15 '22 15:10 xxfast

The reasons for introducing a separate data object (as opposed to changing the way a regular object behaves) go beyond backwards compatibility. First of all, we want to have a consistent way of declaring sealed class hierarhies, where at the sub-classes and sub-objects are consistently marked with data modifier. For example:

sealed class UserResult {
    data class Found(val user: User) : FindUserResult()
    data object NotFound : FindUserResult()
}

Also, this is just a stepping-stone in a progression of the future planned features. We do plan to introduce a more compact syntax for sealed class hierarchies for day(KT-47868 Concise syntax for defining sealed class inheritors), akin to enum class syntax, that eschews most of the boilerplate code, so that the above declaration would be simplified to something like this:

sealed class UserResult { Found(val user: User), NotFound }

Here, the applicability of data modifier to both objects and classes will make it easier to explain desugaring of this code into the more verbose version above.

Moreover, we do plan to work on a better approach to objects that are used only for the purpose of namespacing several declarations together (like kotlin.Delegates objects). In the future, we plan to turn them into some kind of "static objects", so turning all such plain objects into "data objects" for the completely different reason seems like a wrong move.

elizarov avatar Dec 21 '22 11:12 elizarov

I'm eagerly waiting for this feature to become stable. Points 3 and 5 are applicable for my use case - KT-40218.

arkivanov avatar Dec 30 '22 12:12 arkivanov

There has been a massive improvement in the text of KEEP that now includes detailed information on the KEEP. The decisions around serialization have also been finalized (TL;DR: no special support for Java serialization, but it'll work fine with any kind of serialization thanks to the generated equals and hashcode). Please, see the updated text here: https://github.com/Kotlin/KEEP/blob/data-objects/proposals/data-objects.md

elizarov avatar Jan 13 '23 13:01 elizarov

@elizarov Thanks for the update! So it won't fix KT-40218?

Specifically the following case:

sealed class Option<in T> : Serializable
class Some<T>(val value: T) : Option<T>()
object None : Option<Any>()

fun handleOption(option: Option<String>) = when (option) {
    is Some -> "some ${option.value}"
    None -> "none"
}

The code above crashes after deserialization, because None has another instance.

arkivanov avatar Jan 13 '23 13:01 arkivanov

@elizarov Thanks for the update! So it won't fix KT-40218?

It will be fixed if you use data object None, because of the auto-generated equals which is then used by when expression.

elizarov avatar Jan 13 '23 14:01 elizarov

Thanks, I thought when uses reference equality for objects.

arkivanov avatar Jan 13 '23 14:01 arkivanov

Can I ask for clarification about the difference between equals (==) between data and non-data objects?

I am focusing on jvm for this question, but I'm curious about other platforms as well.

Is the expected behavior the following?

  • In typical cases, it will always return true when comparing two references to the same object type
  • In atypical cases, (separate classloaders, deserialization, reflection) we get different behavior for data and non-data objects:
    • For non-data objects, equals may return false for two references to the same object type because under the hood somehow two different instances were created or even two different types?
    • For data objects, there is more strict adherence to structural equality so there is more of a guarantee that equals return true, even in these atypical scenarios.

That is my interpretation of the KEEP, but I'd like to have a more precise understanding of the exact scenarios when I should expect equals to return true for data and non-data objects.

mgroth0 avatar Sep 17 '23 19:09 mgroth0

but I'd like to have a more precise understanding of the exact scenarios when I should expect equals to return true for data and non-data objects

One of the use cases is Java serialization. After serialization you will have another instance of an object, which is not equal to the original one (equals returns false).

Another use case I noticed is that hashCode of an object returns different value on JavaScript after each page refresh. Whereas for data object it's always the same.

arkivanov avatar Sep 17 '23 19:09 arkivanov