skrub Better handling of cases where a deferred function returns more than one value

Following up on a question on the skrub discord.

I have a function like this, which returns more than one value

test = skrub.var("test", [1,2])

@skrub.deferred
def process_test_data(test):
    left = test[0]
    right = test[1]
    return left, right

I cannot unpack the result directly because an exception is raised:

left, right  = test.skb.apply_func(process_test_data)

gives

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[23], line 2
      1 # %%
----> 2 left, right  = test.skb.apply_func(process_test_data)

File ~/Projects/work/skrub/skrub/_data_ops/_data_ops.py:593, in DataOp.__iter__(self)
    592 def __iter__(self):
--> 593     raise TypeError(
    594         "This object is a DataOp that will be evaluated later, "
    595         "when your learner runs. So it is not possible to eagerly "
    596         "iterate over it now."
    597     )

TypeError: This object is a DataOp that will be evaluated later, when your learner runs. So it is not possible to eagerly iterate over it now.

Instead, I have to assign the result to a different variable, then unpack that:

res = test.skb.apply_func(process_test_data)
left = res[0]
right = res[1]  

combine = left + right
combine

How should we handle this use of deferred functions?

We could leave the functionality as is, explaining in the user guide how to unpack the returned tuple
We could modify deferred so that returning more than one value wraps each value into a DataOp, so that the resulting tuple can be unpacked directly.

I don't remember if deferred functions are intended to return only one value, or if we simply have never prepared examples with one single value.

Aug 24 '25 15:08 rcap107

Expanding on this after discussing with other devs.

Unpacking means iterating and assigning values. Iterating is not possible on Data Ops because a Data Op cannot know what it is iterating on until it is evaluated. The same problem would happen with something like n, p = skrub.X().shape or a, b = skrub.var('t', (1, 2))

The gist of it is that this is a "wontfix" issue because the problem does not have easy solutions. I will add to the user guide a note on this specific circumstance so that people are aware of the workaround.

Aug 25 '25 12:08 rcap107

The gist of it is that this is a "wontfix" issue because the problem does not have easy solutions. I will add to the user guide a note on this specific circumstance so that people are aware of the workaround.

Can we have a good error message?

Aug 25 '25 12:08 GaelVaroquaux

The gist of it is that this is a "wontfix" issue because the problem does not have easy solutions. I will add to the user guide a note on this specific circumstance so that people are aware of the workaround. Can we have a good error message?

Yes, absolutely

Aug 25 '25 13:08 rcap107

Maybe it could be reworded a bit but we already have a dedicated error message for this:

>>> import skrub
>>> iter(skrub.var('a'))
Traceback (most recent call last):
    ...
TypeError: This object is a DataOp that will be evaluated later, when your learner runs. So it is not possible to eagerly iterate over it now.

Dec 06 '25 18:12 jeromedockes

This issue can be closed once we add a small paragraph to the control_flow.rst file in the documentation that explains the issue a bit. There should be:

a brief explanation of the situation
how to address it
the snippet of code I wrote above as an example

This should give the users an idea of how to deal with this situation.

Dec 15 '25 15:12 rcap107