json-machine icon indicating copy to clipboard operation
json-machine copied to clipboard

Recursive iteration

Open halaxa opened this issue 4 years ago • 13 comments

Enable iteration inside iteration to lazily iterate any structure.

foreach (RecursiveJsonMachine::fromFile('universe.json', '/results') as $gallaxy) {
    foreach($gallaxy as $solarSystem) {
        foreach($solarSystem as $solarSystemObject) {
            // process star, planet, asteroid ...
        }
    }
}

~The solution lies in yield from.~ (It doesn't. yield from also tries to rewind the generator. It must be a good old iteration via Iterator methods.) "Child" generator will be given a reference to the top level SyntaxCheckedTokens/Tokens generator and will ~yield from~ yield from it until the end of its iteration level. Then the control will be returned back to the higher-level generator. All will be happening on the same instance of SyntaxCheckedTokens/Tokens

How this will reflect in public API must be well thought through.

halaxa avatar Nov 10 '20 20:11 halaxa

I like (and need) this. How is this going? Can I help? Thanks

pfructuoso avatar Oct 20 '22 07:10 pfructuoso

Thank you. No progress whatsoever :) The idea persists though.

halaxa avatar Oct 20 '22 09:10 halaxa

Could a nested JSON pointer - as explained in the readme - help you in the meantime?

halaxa avatar Oct 20 '22 10:10 halaxa

I tried to implement it and turned this Issue into PR. Can you try it and throw in some suggestions?

halaxa avatar Oct 22 '22 18:10 halaxa

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (5147f38) 100.00% compared to head (01fc434) 99.67%. Report is 1 commits behind head on master.

Files Patch % Lines
src/Parser.php 95.65% 2 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@              Coverage Diff              @@
##              master      #36      +/-   ##
=============================================
- Coverage     100.00%   99.67%   -0.33%     
- Complexity       185      220      +35     
=============================================
  Files             17       19       +2     
  Lines            526      607      +81     
=============================================
+ Hits             526      605      +79     
- Misses             0        2       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Oct 22 '22 18:10 codecov-commenter

I ponder that a better name for this feature might be something like lazy_vectors rather than recursive. Because for example lazy_scalars option (as stream resources) may arrive later. Moreover, recursive is somehow vague and does not exactly express what the option does. Any ideas on the name anyone? Ping @fwolfsjaeger @cerbero90

halaxa avatar Oct 22 '22 18:10 halaxa

Hey @halaxa, I was looking into a very similar solution a couple of days ago. Great timing!

I'd say that what we are leveraging here are technically nested iterators, so probably naming them as what they actually are may cause less confusion.

I see that you generate iterators for objects too, wouldn't iterators for arrays be enough?

cerbero90 avatar Oct 24 '22 07:10 cerbero90

I agree that recursive is not the right word to use. As @cerbero90 suggested, nested iterator would make sense.

Also, by returning an actual Iterator object it would be possible to use SPL iterators to further filter the data.

fwolfsjaeger avatar Oct 24 '22 08:10 fwolfsjaeger

So the name of the feature and the option would be nested iterators?

The point about not iteratig objects made me think of two optinons instead of just one. One for arrays as iterators and second for objects as iterators. That way users can decide what they want.

Also, what about providing a method called for example $nestedIterator->materialize() so users can decide which level/item should be accessible via object/array?

The nested iterator should also be RecursiveIterator, not just Generator instance so it's compatible with spl recursive iterators.

halaxa avatar Oct 24 '22 10:10 halaxa

Looking at the recursive example in the readme in this PR, I think a very useful method might also be $user->advanceTo('friends') with support for syntax sugar $user['friends']. The example would then simplify to something like

foreach ($users as $user) {
    foreach ($user['friends'] as $friend) {
        $friend = $friend->materialize(); // or maybe $friend->decode()?
        $friend['name'], $friend['avatar'];
    }
}

halaxa avatar Oct 24 '22 11:10 halaxa

First of all, thanks for get back to this. It would help a lot. Regarding the naming, nested iterators, makes sense to me.

The example you show in previous commit with syntax sugar looks promising. However, the ideal would be being able to materialize one item without materialize its soons. EG:

foreach ($users as $user) {
    echo $user['name']; // $user['name']->materialize() should work too
    foreach ($user['friends'] as $friend) { // 'friends' instanceof Traversable, not an array/object
        $friend = $friend->materialize();
        $friend['name'], $friend['avatar'];
    }
}

So, I would differenciate between materialize() and decode():

  • materialize: In case of an array/object, if items/props aren't parsed, only scalar data types.
  • decode: Equals to json_decode()

From here, it shouldn't be difficult decode omiting one or more props $user->decode(['omit' => ['friends']])

pfructuoso avatar Oct 25 '22 15:10 pfructuoso

Thanks for the feedback.

I'm not sure about $user['name']->materialize() as it's expected to return string and not an object.

As for the decode() method, I like it less and less. Its first meaning - simply decoding a value - is superfluous as decoding is implicit and transparent from the moment user specifies a decoder option. Its possible second meaning - similarity to the word materialize - may be misleading. Especially so, because I plan to stack decoding as another generator on top of Parser, and then the decode option might be null as PassthruDecoder will be a thing of the past. What would the user think the method decode() does when they explicitly specified no decoding?

What I'm currently wondering about is the format of the key of advanceTo($key). Should it be in the expected decoded format according to a decoder provided or should it be always in pure JSON string format, so it does not change when a decoder is swapped?

Additionally, if we later stack decoding on top of Parser, the decoding traversable should know methods advanceTo() and materialize(). Probably not. Not SRP enough. So we can put another generator on top of it to handle those syntax sugar methods. Unfortunately, the problems don't end there. Too late to explain myself for today. Good night ;)

halaxa avatar Oct 25 '22 20:10 halaxa

So, after some thinking about it, I would lean towards the separate RecursiveItems facade class (similar to Items) to wrap this functionality in. It will better fit the single responsibility principle. It itself may also implement PHP's RecursiveIterator and Items won't become a mess with the additional recursive option or with some other API extensions that might come in the future. Any takes on this?

halaxa avatar Mar 12 '23 20:03 halaxa