unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

Eagerizing through ListStreams and GeneratorStreams

Open dafnapension opened this issue 1 year ago • 2 comments

Hi @elronbandel , I had this in mind re the issue you assigned to me. Is this in the direction you had in mind? If yes, I will clean the celebration by adding a function to standard.py, maybe, that sticks the filler in between the steps of a given BaseRecipe object.

dafnapension avatar May 17 '24 07:05 dafnapension

The way i thought about it is that every base operator StreamInstanceOperator StreamOprator MultiStreamOperator FieldOperator will have two opeartors they use: StreamingFieldOperator and EagerFieldOperator

so the call of each will look like:

def __call__(multi_stream):
    if settings.eager_mode:
         return EagerFieldOperator.__call__(multi_stream)
    else:
         return StreamingFieldOperator.__call__(multi_stream)

elronbandel avatar May 19 '24 07:05 elronbandel

Thanks @elronbandel , do you see a way to only modify those four ancestor-operators? or need to change all their descendants: each and every operator in operators.py?

dafnapension avatar May 19 '24 09:05 dafnapension

Codecov Report

Attention: Patch coverage is 60.49383% with 32 lines in your changes missing coverage. Please review.

Project coverage is 91.37%. Comparing base (a62c359) to head (117f6fd). Report is 23 commits behind head on main.

:exclamation: Current head 117f6fd differs from pull request most recent head a590fe8

Please upload reports for the commit a590fe8 to get more accurate results.

Files Patch % Lines
src/unitxt/stream.py 46.42% 30 Missing :warning:
src/unitxt/splitters.py 75.00% 1 Missing :warning:
tests/library/test_recipe.py 87.50% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #843      +/-   ##
==========================================
- Coverage   91.62%   91.37%   -0.25%     
==========================================
  Files         104      104              
  Lines       10996    11247     +251     
==========================================
+ Hits        10075    10277     +202     
- Misses        921      970      +49     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 26 '24 21:05 codecov[bot]

Hi @elronbandel , I add one commit here that makes test_recipe pass, except one case that I can not explain, where the recipe_metadata does not come the same twice in eager_mode. perhaps you can understand.. maybe the random generator (that is part of that field) is not the same object in same address? I do not find test_recipe in any of the logs. Perhaps my fault. In my dev environment, it passes.

dafnapension avatar May 26 '24 21:05 dafnapension

So I digged into the code, debugging from test_recipe. All cases that reached, eventually, to this Exception that bothers you, @elronbandel , stemmed, indeed from "disappearing" streams. Some of them disappear for years now, we simply did not look for them. I added a check, upon the return from each recipe() in test_recipe() that prints out all the streams it carries with it, and what is their status. And sure enough - many streams are missing in action... The other source for the exception, are streams that recently began to disappear - with the new tests of Yoav, that pushes a new multi-stream into an existing recipe, replacing existing streams (that are thus gone). Here is a tweaked test_recipe that reveals all the classically disappearing streams:

test_recipe.TXT

dafnapension avatar Jun 03 '24 18:06 dafnapension

Hi @elronbandel , I hope this is closer to what you had in mind

dafnapension avatar Jun 04 '24 20:06 dafnapension

covered by the better version: https://github.com/IBM/unitxt/pull/888

dafnapension avatar Jun 06 '24 09:06 dafnapension