Eagerizing through ListStreams and GeneratorStreams
Hi @elronbandel , I had this in mind re the issue you assigned to me. Is this in the direction you had in mind? If yes, I will clean the celebration by adding a function to standard.py, maybe, that sticks the filler in between the steps of a given BaseRecipe object.
The way i thought about it is that every base operator StreamInstanceOperator StreamOprator MultiStreamOperator FieldOperator will have two opeartors they use: StreamingFieldOperator and EagerFieldOperator
so the call of each will look like:
def __call__(multi_stream):
if settings.eager_mode:
return EagerFieldOperator.__call__(multi_stream)
else:
return StreamingFieldOperator.__call__(multi_stream)
Thanks @elronbandel , do you see a way to only modify those four ancestor-operators? or need to change all their descendants: each and every operator in operators.py?
Codecov Report
Attention: Patch coverage is 60.49383% with 32 lines in your changes missing coverage. Please review.
Project coverage is 91.37%. Comparing base (
a62c359) to head (117f6fd). Report is 23 commits behind head on main.
:exclamation: Current head 117f6fd differs from pull request most recent head a590fe8
Please upload reports for the commit a590fe8 to get more accurate results.
| Files | Patch % | Lines |
|---|---|---|
| src/unitxt/stream.py | 46.42% | 30 Missing :warning: |
| src/unitxt/splitters.py | 75.00% | 1 Missing :warning: |
| tests/library/test_recipe.py | 87.50% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #843 +/- ##
==========================================
- Coverage 91.62% 91.37% -0.25%
==========================================
Files 104 104
Lines 10996 11247 +251
==========================================
+ Hits 10075 10277 +202
- Misses 921 970 +49
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @elronbandel ,
I add one commit here that makes test_recipe pass, except one case that I can not explain, where the recipe_metadata does not come the same twice in eager_mode. perhaps you can understand.. maybe the random generator (that is part of that field) is not the same object in same address?
I do not find test_recipe in any of the logs. Perhaps my fault. In my dev environment, it passes.
So I digged into the code, debugging from test_recipe.
All cases that reached, eventually, to this Exception that bothers you, @elronbandel , stemmed, indeed from "disappearing" streams.
Some of them disappear for years now, we simply did not look for them. I added a check, upon the return from each recipe() in test_recipe() that prints out all the streams it carries with it, and what is their status. And sure enough - many streams are missing in action... The other source for the exception, are streams that recently began to disappear - with the new tests of Yoav, that pushes a new multi-stream into an existing recipe, replacing existing streams (that are thus gone).
Here is a tweaked test_recipe that reveals all the classically disappearing streams:
Hi @elronbandel , I hope this is closer to what you had in mind
covered by the better version: https://github.com/IBM/unitxt/pull/888