Serde
Serde copied to clipboard
Serializing objects with iterable properties
Detailed description
I'd like to see serialization support for object properties of type "iterable". They should be treated the same way as arrays (or more precise sequences).
Context
I am working on an export function to aggregate and dump data of a medical study website. Collecting the neccessary data is highly complex and takes several minutes even on the live servers. In the past, I've been using the CSV file format, dumping every row directly to "php://output" with no output buffer. This allowed the browser to update the download's file size every few seconds, and hinting the user with some kind of progress.
However, the new data is too complex for a CSV dump, so I have to switch to a more dynamic format (JSON that is, using Serde's json-stream formatter).
Now, instead of collecting all the export data first (which takes a few minutes, like I said), and then passing the result array to Serde, I'd like to lazy-load the data while serializing. This would allow me to start the file download instantly, with the file size growing over time like it used to be.
So I tried changing my property's type from array to iterable and assigning a generator function, but Serde treats iterable as an unsupported type. Changing the type to \Traversable or \Iterable doesn't work as well (PHP Warning: Cannot bind closure to scope of internal class Generator in vendor\crell\serde\src\PropertyHandler\ObjectExporter.php line 30).
Possible implementation
Well, I just needed to add iterable to Attributes\Field.php -> deriveTypeCategory() -> TypeCategory::Array and PropertyHandler\SequenceExporter.php -> canExport(), and now Serde is happily serializing my generator function to a JSON array.
But of course, implementing it correctly takes some more considerations.
- Serializing an
iterableworks well, but you can't deserialize to aniterabledirectly. So if this feature gets added to Serde eventually, it would be one-way only. This might be a bad thing or not. - Is it sufficient to just add support for
iterable? What if my property's type annotation is\Traversable,\Iterableor even\Generator? Could possibly be solved with some kind of check, whether the whole type inherits fromiterable? - What about classes inheriting \Traversable, but also having additional properties? Should they be serialized as usual, or be treated as an array? I'd say, they should be treated as usual, unless they are attributed with
#[SequenceField]
This might be very well an edge case, but what are your thoughts about this? I hope you get my intentions behind.
So... this is an interesting problem space. I've been pondering it, and need to think aloud a bit...
I've wanted to figure out how to do lazy stream generation, like for large CSV files. I figured that would involve some way of serializing an array, rather than an object, and thus probably a new frontend than Serde. However, between this issue and the discussion in #5, I'm beginning to think that a single-property object is the way to go. That's not quite as flexible for the user (they have to define such an object), but perhaps that's OK if the overall process ends up being cleaner and has fewer moving parts.
Ideally, everything should be symmetric. However, serialization is way easier than deserialization, so there's already asymmetry for json-stream anyway. So perhaps that's not an issue.
Also, an array is iterable, so if you deserialize into an array on an iterable property, that's still completely legal. If the data set is large, then you'd want to deserialize into a generator or something, maybe, but that sounds really really hard. :smile:
I... think probably the default behavior should be that custom \Traversable classes should be treated as a normal class. You're serializing its state, not its output. If you want to serialize its output, you can always add a new Exporter that attaches to \Traversable objects and calls iterator_to_array() on them. That's best handled via a custom Exporter, I'm fairly sure. (Though that may be worth documenting somewhere.)
Another concern: if you have a lazy iterable property, presumably you want it to generate lazily so you never have the whole thing in memory at once. However, if you're serializing to a string then you're going to have the whole thing in memory at some point anyway. So, do we do anything different?
Does anything change in the above if we're talking about an object with a bunch of properties, only some of which are iterable, rather than a single wrapper iterable object? I... don't know. I don't think so, but I don't know.
I think I need to noodle on this for a bit to figure out how to best handle it. There's probably some odd edge cases we haven't thought of.
I am currently between jobs, so if you are able to sponsor my time that would increase it's priority. :smile_cat: Off the top of my head, I think I'd estimate this at around 5-8 hours of work to flesh through the details. (Though as usual, no plan survives first contact with the enemy.)
Some preliminary work in #18. Still want to noodle a bit more, but there's good progress there.
Resolved by #18