Problems with DataView
Lets jump right into some of the problems and things I dislike about Sponge's DataView system, then look at some solutions.
Major Problems with DataView
-
set() allows any object in, even the ones that can't be broken down into primitives. This may be one of the biggest problems that DataView is facing. This means a nearly unlimited number of types can come out of get() causing serializers (DataFormat's and DataTranslator's) to be unable to guarantee serialization of a given DataView! As an example of how hard this can be to debug, lets say we have 3 plugins: A, B, and C. Plugins A and B offer some cool DataSerializable objects. Plugin C is like "Wow! I can save some of plugin A and B's cool objects to my Hocon config file!". Plugin C serializes a mix of Sponge objects, A's objects, and B's objects into one large DataContainer. Plugin C then goes to serialize the DataContainer into Hocon and an error is thrown! The error is complaining that it can't serialize Foo. Foo is a non DataSerializable object from an obscure part of Sponge. The question is, who tried to set(Foo) somewhere in your DataContainer? There is stuff from Sponge, A, B, and C. If set() had error'ed the moment the/a bad object had been fed to it, it would have been much easer to debug.
-
DataView has many redundant primitive types. I don't think even one of the serializers supports them all. Here are just some of the redundant types: DataView, Map List<DataView>, List<Map<?, ?>> byte[], Byte[], List<Byte> String, List<Char>
-
Too often a contains() check is done to prove that a get*Type*() method WILL return present. This happens in around ~70 places in SpongeCommon! Just search for DataView#contain(DataQuerry)'s usage in your IDE or something, and a lot of the usages can cause an NPE! examples: https://github.com/SpongePowered/SpongeCommon/blob/9e6acff2f9c25f346151420d368a26a0e3578a5b/src/main/java/org/spongepowered/common/data/builder/block/tileentity/SpongeBeaconBuilder.java#L49-L54 https://github.com/SpongePowered/SpongeCommon/blob/9e6acff2f9c25f346151420d368a26a0e3578a5b/src/main/java/org/spongepowered/common/block/SpongeBlockSnapshotBuilder.java#L231-L237 https://github.com/SpongePowered/SpongeAPI/blob/871fcd1b9cddc82fefce9fdecc1ac690ae6762dd/src/main/java/org/spongepowered/api/data/persistence/AbstractDataBuilder.java#L73-L74 User: "DataView do you have 'foo'?" DataView: "yeah" User: "DataView give me the Integer 'foo'!" DataView: "ahhhh, Optional.empty()?" User dies form a NullPointerException when they call get() The only way to reeeeeeeeealy know if data exists is to ask for it with its type, aka get*Type*(), I would even go as far as to remove contains() all together! I should also note that DataUtil#checkDataExists() is causing the same issues.
-
Java Maps and Lists from Bukkit's configuration system have infiltrated their way into DataView and must be purged! DataView and Map are very much alike, but one has all the cool stuff for serialization and can have safety features that yell at the developer when they do something stupid. List also suffers from all the same problems that Map has, but there is no Data*List* type at all.
Minor Problems with DataView
-
getContainer() and getParent() allows deserializers to access and depend on parents and even directly the root of the data structure.
-
DataContainer does not need to exist. It would be better to keep things simple, maybe have a DataContainer in some weird implementation of DataView, but keep it out of the API?
-
DataView.SafetyMode as a solution to the 'array mutability problem'. SafetyMode adds state other then the 'data' being stored by DataView, state that needs to be checked every time a user stores a primitive array that is in use, or just clone 100% of the time, leading to redundant cloning when SafetyMode also dictates that the DataView also clone the array! The class that is set()'ing and or get*Type*()'ing knows more about the usage of its arrays then DataView ever will.
DataView javadoc issues
- It should be made VERY CLEAR that you should NEVER assume the type returned by get()!!! https://github.com/SpongePowered/SpongeCommon/blob/bleeding/src/main/java/org/spongepowered/common/data/persistence/LegacySchematicTranslator.java#L116-L118 https://github.com/SpongePowered/SpongeCommon/blob/bda59aa2c0fc3270d0f1b55a1bd49ec5ccb7be97/src/main/java/org/spongepowered/common/data/persistence/SchematicTranslator.java#L146 This just makes me want to rage! D:
Stuff that is not really needed
-
getKeys(/*deep =*/ true) and getValues(/*deep =*/ true) ... I think this was just copied from Bukkit, trust me you do not really need it.
-
DataContainer is just used to construct DataViews but also happens to extend DataView. Why? So one can tell if a DataView is the root? If one even cares about that, as I have addressed above, they can use getParent() to check that.
DataSerializer problems
-
toContainer() must create its own new DataView (DataContainer), I think it would be better if toContainer() was given a DataView to fill out, and returned void. This would remove the need for a DataView implementation to copy a DataView EVERY TIME a DataSerializable is used!!! D:< DataContainer toContainer(); // BAD void toContainer(DataView); // OK
-
A DataSerializable always serializes into a DataView, DataView is a 'map' type. What about a class that stores only one integer? That can make for one ugly config file! Speaking of config files, Configurate's serializers fill out a ConfigurationNode allowing for maps, lists, and primitives. If only we had something like that, lets call it DataValue. void toContainer(DataValue); // GOOD
Fixing DataView
How better to explain things than with some code? I have create a reference DataView library where I have fixed most of the problems described above, however, it could use some work before it is added to Sponge. https://github.com/xcube16/DataView
Some things to note before you get confused
- Two new interfaces have been added, DataMap and DataList, both extending DataView.
- DataView now has a generic index type and only has methods common to both DataMaps and DataLists.
How does this reference DataView library fix the problems above?
-
There are a very limited number of 'Allowed Types' that can be inside a given DataView (see DataView javadoc https://github.com/xcube16/DataView/blob/master/src/main/java/io/github/xcube16/data/DataView.java#L58-L80). DataView#set() will now throw an IllegalArgumentException when it can't break the object down into Allowed Types. This will let users know where and when (still at runtime though :/) that they tried to put a bad Object into a DataView.
-
A well defined set of Allowed Types allows for GUARANTEED SERIALIZATION of a DataView if a serializer supports serialization of all Allowed Types to the extent that they can be losslessly Coerce'd into the original Allowed Type. (That is not hard any more, even for booleans in NBT) FAQ: Q: How do we put booleans in NBT? A: Just use an int with 1 or 0 and let Coerce handle it! :P
-
No contains() methods? No problem! Can't use a method the wrong way if it does not exist in the first place! ;) All get*Type*() methods return an Optional; that is how you know if something exists or not.
-
DataList has replaced java's List. BONUS: DataLists are query-able! Just think, a_dataMap.getString(DataQuery.of("entities", "5", "id"))
-
getContainer() and getParent() have been removed, DataView's are now singly linked. This makes a DataView more self contained from an API standpoint. DataContainer has also been removed.
-
DataView.SafetyMode has been removed, DataViews will now act in a NO_DATA_CLONED way. It is up to the users of set() and getArrayType() to clone the data if they intend to mutate it. If you think about it, when you get a DataView from another DataView it is not a clone, and it is mutable. I don't really like it and am open to ideas. TODO: Clearly define this in the javadoc and hope people read it. :P Worst case scenario would be a bad DataSerializable and someone making DataView snapshots of it. Primitive arrays are an optimization after all, and that is one of the reasons ConfigurationNodes don't support them (instead they use lists, but that is a lot more objects and would not work well for Schematics).
-
get()'s javadoc now clearly defines that one should NEVER rely on the returned Object to be a specific Allowed Type. And that get*Type*() should be used instead. (get() is used only for iteration when serializing the DataView or something)
-
getKeys(boolean) and getValues(boolean) have been replaced with a forEachKey() method.
-
Added DataValue that just stores one value. This will allow deserializing an object from not just a DataMap, but any Allowed Type. DataValue is not part of a DataView structure, it is only used to wrap values going into a DataView (set()), or out of DataView (getValue()). You could almost think of it like DataContainer (it has no parents, however, don't get confused, the value may only have a DataView parent or no parent at all). DataValue also allows a binary format that's top.
side notes
One thing that still bothers me is multi type lists (yes, DataList supports that). The worst case I can think of would be serializing a multi type list to NBT. It can be done by nesting another binary format inside a byte array or something. It could also be solved by using a list of lists, for example, ["string1", "string2", 1, 2, 3] would be broken into [["string1", "string2"],[1, 3, 4]]. There are all kinds of crazy hacks to make this work, but I will understand if we want to make DataList enforce a single type per list. (just note that Json multi type arrays could not be deserialized into a DataList)
It would be nice if we could break DataSerializable into DataSerializable and VersionedSerializable, idk, just a random idea.
TODO: Empty DataValues are not well supported. Should they be? I have had use cases for them and have been using empty strings or empty lists.
Examples
// TODO: add more examples
public class SerializableFoo implements DataSerializable {
private double a_number;
private String bar;
private DataSerializable baz;
public void toContainer(DataValue data) {
data.createMap()
.set("a_number", a_number)
.set("bar", bar)
.set("baz", baz);
}
}
public class SpecialIntager implements DataSerializable {
private int theInteger;
public void toContainer(DataValue data) {
data.set(theInteger);
}
}
Refactoring Sponge
Most of the changes proposed here require changes to almost all parts of SpongeAPI and SpongeCommon that use DataView. The reference DataView library I have created could use more polishing. DataView, as I see it, is a mix of ideas form Configurate's ConfigurationNode, Minecraft's NBT, and avoidance of the mistakes in Bukkit.
Edit: Forgot to escape < and > in the redundant types