TMPE icon indicating copy to clipboard operation
TMPE copied to clipboard

New persistence strategy for better cross-version compatibility

Open Elesbaan70 opened this issue 2 years ago • 58 comments

Note - This issue originally called for the use of Json, which was determined be unfeasible due to the lack of a good full-featured Json serializer that is compatible with Unity.

Describe your idea

In TMPE's current persistence strategy, the introduction of a new type into the persistent data will break backward compatibility. Although old versions are not supported, players reverting from alpha to stable should not lose savegame data needlessly.

The same issue also affects forward compatibility in ways that restrict future development choices. Ideally, the save data would not be bound to specific types. A move away from strongly typed serialization would give everyone more flexibility moving forward.

Requirements

  1. This is an opt-in change. All existing serialization will remain untouched until there is a reason to do it the new way, as determined feature by feature.
  2. A manager-based persistence mechanism in which SerializableDataExtension enumerates a collection of managers and calls LoadData() and SaveData() methods basically like the ones we have today.
  3. SerializableDataExtension will save the data from each manager separately, to isolate them and minimize the risks associated with changes to persistent data.
  4. A manager will have a means of identifying other managers upon which its data depends. This will determine the enumeration order during load and save.
  5. Linq to XML will be used for persistence.

TrafficLightSimulationManager will be the first converted to the new persistence strategy, since displaced lane support was the breaking change that prompted this issue. Its conversion to the new persistence strategy will be delivered separately from and prior to displaced lane support.

Tasks

  • [x] Persistence framework
  • [x] Add a DOM to the serialized data and make a mechanism for opt-in transition away from the old serialization strategy (SerializableDataExtension)
  • [x] Update timed traffic lights to be saved into the DOM (half-baked persistable model)
  • [x] Update junction restrictions to be saved into the DOM (fully persistable model)
  • [x] Compress XML data, with fallback to uncompressed if it fails (especially on some Linux distros)
  • [x] Test versioning in a relevant feature branch such as https://github.com/Elesbaan70/TMPE/tree/lane-grouping
  • [x] Final code cleanup
  • [x] Wiki articles for future development
  • [x] Mark #1559 ready for review

Elesbaan70 avatar Mar 30 '22 02:03 Elesbaan70

is json better than xml?

kianzarrin avatar Apr 01 '22 22:04 kianzarrin

binary serialization is possible in theory but is clunky in practice specially with this old version of mono.

kianzarrin avatar Apr 01 '22 22:04 kianzarrin

is json better than xml?

Json's simplicity tends to mean less setup and tweaking to get the results you need, which I think makes it better for small applications like TMPE. For large and complex data models I tend to lean toward XML, but in smaller applications like this, XML done right could end up being ~one of the most complex parts of the code~ - perhaps an overstatement, but it could take on complexity that seems out of proportion to the amount of data we're working with.

I also prefer XML for files intended to be manually edited. Json can be manually edited, but in my opinion it is a slightly less friendly format for that.

(If you're wondering, I think XML is a better fit for AN, both because your long-term plans involve much more complex data, and because it's intended to be manually editable.)

binary serialization is possible in theory but is clunky in practice specially with this old version of mono.

There's actually a binary version of Json, unsurprisingly called Bson, and the version of mono is a non-issue as far as that goes. But text serialization is easier to troubleshoot, and I don't think we're talking about enough data for file size to be a concern.

Elesbaan70 avatar Apr 02 '22 00:04 Elesbaan70

is json better than xml?

Also, System.Xml.Serialization has a lot of weird limitations, like you can't use it to serialize dictionaries. So we'd have to either live with the limitations, or else find a third-party XML serializer. Newtonsoft.Json is technically third-party, but it's a very pervasive standard, especially on older .NET/Mono versions.

Elesbaan70 avatar Apr 02 '22 01:04 Elesbaan70

Json sounds great. We already use XML in TMPE to store global config. i hope using several different standards does not cause confusion.

kianzarrin avatar Apr 02 '22 10:04 kianzarrin

in AN ppl manually modify pops so I think XML is good

kianzarrin avatar Apr 02 '22 10:04 kianzarrin

XML allows saving more complex data structures which will be way more readable for the user if he wants to edit something. The main problem with the current implementation. I'll remind that we are talking about serialization of all TM:PE data in the savegame not just simple configs. The problem now is we store everything into one single binary which cannot be deserialized if you don't have fully compatible classes. Just like @Elesbaan70 mentioned, literally any change in the data model makes it unloadable when you try to use previous version of the mod. Example: when you add boolean field to one of serialized classes (you don't need to even use it anywhere) the new data after serialization will be 100% incompatible with previous version of the mod because of how binary serialization work - chunk of bytes, requires a class which work as a form of template describing how data should be read.

Having a JSON or even XML will make it at least editable -> you can dump the binary data into (JSON/XML) string and modify it at worst case or prepare small migration function which could automatically fix/transform things in case we make a mistake somewhere. With binary you can't even see what's inside. Also one bonus of JSON/XML or any string based serialization -> you can easily write data migration code and most importantly unit/integration tests for that (some time ago I tried loading TMPE binary data and deserialize in unit test - impossible).

We are limited with size of binary array which can be saved by SerializableDataExtension (~16MB), so splitting data by managers will solve that problem.

Anyways I asked @Elesbaan70 about other ways for serialization since I also want to introduce changes to TMPE savegame data. I need to store "airplane size restrictions" somehow, so just an addition to what we have without changing anything else. When I push that change to my draft PR it will break compatibility because non of previous version will be able to load data once saved with new version. Since it would be additional thing I don't see why it couldn't be completely skipped when loading while running previous version of the mod (data itself has no use if there is no code which could interpret it and do something later).

I hope migration out of binary serialization will solve that problem at least partially. Obviously the current (de)serialization code needs to stay where it is to allow loading data from older savegames.

krzychu124 avatar Apr 02 '22 16:04 krzychu124

We already use XML in TMPE to store global config.

Completely different because it's a standalone file and theoretically editable. Although in practice it's not a good implementation for manual editing--you can't even tell which switches are which, except by counting from the top.

XML should be considered for any future standalone TMPE files.

Elesbaan70 avatar Apr 02 '22 17:04 Elesbaan70

the new data after serialization will be 100% incompatible with previous version of the mod because of how binary serialization work - chunk of bytes, requires a class which work as a form of template describing how data should be read.

It's worth noting, in case it's relevant to the choices we're making, that binary serialization doesn't have to be this way. Bson doesn't behave this way. It's a question of how the binary serialization is accomplished.

System.Runtime.Serialization's basic serialization is strongly typed and wants to deserialize and figure out what to do with every element, and so any change breaks backward compatibility. This can probably be overcome with custom serialization, but there's no good reason to make it so complicated when alternatives are available.

Elesbaan70 avatar Apr 02 '22 17:04 Elesbaan70

@aubergine10 The "Settings" label is incorrect for this. It should be labeled "serialization" and "lifecycle". I would fix it, but I don't have any permissions here. :-)

Elesbaan70 avatar Apr 02 '22 17:04 Elesbaan70

It's worth noting, in case it's relevant to the choices we're making, that binary serialization doesn't have to be this way. Bson doesn't behave this way. It's a question of how the binary serialization is accomplished.

Yes I know, but I meant BInaryFormatter we currently use for (de)serialization. I think it won't make much of a difference if we convert JSON string as binary array or do similar with Bson. Whatever we select it need to end up as binary array.

krzychu124 avatar Apr 02 '22 19:04 krzychu124

you can't even tell which switches are which, except by counting from the top.

tell me about it! I do that all the time! I was thinking about overriding the Serializer but I think its easier to just add something in the maintenance tab.

kianzarrin avatar Apr 02 '22 19:04 kianzarrin

I think it won't make much of a difference if we convert JSON string as binary array or do similar with Bson. Whatever we select it need to end up as binary array.

As I'm currently implementing it (a bit different from the demonstration code I showed you before), manager data is saved in a Dictionary<string, string> where the key is the full name of the saved data type (not of the manager type), and the value is the Json-serialized data. (The key is not strictly type-bound, it's only a naming convention.) I chose text serialization for easier troubleshooting.

The dictionary itself is also serialized as Json, which is converted to a byte array using StreamWriter. It could be serialized as Bson instead, but there wouldn't be much savings since the data is text anyway, and staying with Json saves us from having to bring in a second external dependency for the Bson library.

Elesbaan70 avatar Apr 02 '22 21:04 Elesbaan70

After all this discussion, it turns out that we will have to go with XML anyway. The Unity port of Newtonsoft.Json doesn't really work, and it hasn't been maintained in several years. *sigh*

Elesbaan70 avatar Apr 03 '22 01:04 Elesbaan70

Lets use a port of XML that is less problematic. I suspect this old version of mono does not have very good XML. I see macsurgey's mods have provided their own System.Xml.Linq.dll

kianzarrin avatar Apr 03 '22 05:04 kianzarrin

Linq to XML is a DOM, not a serializer. I haven't look at it in a while, so I'll take a fresh look, but working with a DOM basically means you're assembling and interpreting the XML elements manually, so it ends up being a lot of work.

Maybe I'm just being naïve, but I'm not very worried about Mono's System.Xml.Serialization. It's just that it's a pain to work with, but still probably better than doing it all ourselves.

Elesbaan70 avatar Apr 03 '22 06:04 Elesbaan70

It's just that it's a pain to work with

If I recall properly lists are a pain to work with.

but still probably better than doing it all ourselves.

doing it ourselves? why not to just use 3rd party library?

kianzarrin avatar Apr 03 '22 06:04 kianzarrin

If I recall properly lists are a pain to work with.

I guess I don't have context for that, because I'm not sure what you're referring to.

doing it ourselves? why not to just use 3rd party library?

I meant "doing it ourselves" in the sense of processing all the elements manually, like MacSergey is apparently doing since he uses Linq to XML.

If you know of third-party serialization that will work under Unity and has a solid track record and active development, I'm all for it. But the Unity port of Newtonsoft.Json being nonfunctional and abandoned is an example of where this can eventually lead if we're not careful.

Elesbaan70 avatar Apr 03 '22 07:04 Elesbaan70

I don't know I have to do some tests. i think I had to convert list to array for it to work.

kianzarrin avatar Apr 03 '22 08:04 kianzarrin

I was struggling with xml here: https://github.com/Quboid/CS-MoveIt/blob/da6d67d770ff013162f931f01b3e88949ee7b3cc/MoveIt/Moveable/Instance.cs#L120

I don't know if there was an issue with dictionary or list or both but I had to use array.

kianzarrin avatar Apr 03 '22 08:04 kianzarrin

Yes, that's one of the annoying limitations I mentioned in System.Xml.Serialization is that dictionaries are not supported. Another is that it's hard to make enums backward-compatible.

After sleeping on it, I'm starting to warm up to the idea of using Linq to XML like MacSergey appears to be doing (I'm assuming this based on the presence of System.Xml.Linq.dll.) I'm going to try going down that path for a little bit, and see how it plays out. (Since TMPE already converts to/from separate objects for the serializable model, and I was continuing that practice, I'm not sure that processing the elements directly is really that much more work.)

Elesbaan70 avatar Apr 03 '22 13:04 Elesbaan70

If anyone wants to follow the progress in code, it is here: https://github.com/Elesbaan70/TMPE/tree/xml-persistence

Elesbaan70 avatar Apr 08 '22 02:04 Elesbaan70

As I've mentioned before (maybe elsewhere, since I don't see it here), the ideal persistable architecture involves having a set of semi-stable data objects that are maintained by the application, which can be serialized, then later deserialized and submitted to the application. These data objects then act as a kind of persistence API, integral to the implementation yet having a discreet definition that gives them some independence from the program logic.

Currently, TMPE builds an entirely separate object model for persistence, using a combination of public and private APIs to convert to and from this object model. This might be a good approach if the persistent model were independently defined, but as things stand now, it puts too much work on the persistence logic.

There's no clear path out of this, since moving to distinctly defined data objects at the implementation level would be a huge amount of work. But for now, I'm attacking this with a hybrid approach. Here's how it works:

I define a "model" of sorts with interfaces, which are in turn implemented internally. See https://github.com/Elesbaan70/TMPE/tree/xml-persistence/TLM/TLM/TrafficLight/Model

This model accomplishes two things:

  1. It gives the persistence code a clearly defined reference point for understanding what data it needs to save and how it should name things.
  2. It gives the program logic a way to clearly and consistently define persistable data.

What I will not attempt to do at this time is to make this object model work both ways. Loading will work as it always has, except that the new object model interfaces will be used as a reference point for knowing what the saved data looks like.

Elesbaan70 avatar Apr 29 '22 15:04 Elesbaan70

I am removing the compression task since this apparently causes issues on some Linux distros. But we really should come back to it later, because the size of the DOM will balloon on bigger cities. We'll need to pay attention to file size when testing this on larger cities, to make sure the hit is acceptable.

And I am adding a task to test the versioning in the displaced lanes project prior to release. We wouldn't want to deploy this and then find out that it doesn't work as advertised!

Elesbaan70 avatar May 08 '22 14:05 Elesbaan70

We will certainly get issues because even with the existing system some users have so much traffic config that it's too big. XML will vastly exacerbate that because now we'll be using, I dunno, 30 bytes where previously we'd use 1? We need the compression.

If it's a problem for Linux, then we should disable compression on Linux only rather than forcing the majority of users to live without it for no reason.

originalfoo avatar May 08 '22 15:05 originalfoo

A possible (ugly) solution: Attempt to save with compression, and if there is an error, fall back to no compression before finally giving up.

Elesbaan70 avatar May 08 '22 16:05 Elesbaan70

Yup, that would be OK imo, so long as it doesn't make code utterly heinous.

What's the issue with zipping stuff on linux btw? Is it just some distros don't include required libs or have non-standard libs?

originalfoo avatar May 08 '22 16:05 originalfoo

I'm not sure. @krzychu124 could probably provide more details. He referenced MacSergey/NodeMarkup#96, which I skimmed but didn't study closely enough to really understand what was going wrong.

Elesbaan70 avatar May 08 '22 16:05 Elesbaan70

This comment has workaround: https://github.com/MacSergey/NodeMarkup/issues/96#issuecomment-670970107

EDIT: If zipping fails, fallback to uncompressed but log error in game & tmpe logs stating that mono is required?

See also: https://github.com/MacSergey/NodeMarkup/issues/96#issuecomment-826846114

originalfoo avatar May 08 '22 16:05 originalfoo

Yup, but do we want to require a manual install of Mono? I think Linux users would generally be more comfortable with this kind of requirement than on other platforms, but it still seems less than ideal.

Elesbaan70 avatar May 08 '22 16:05 Elesbaan70