marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

RFC: Remove Nested(many=True) to keep only the List(Nested()) form

Open lafrech opened this issue 7 years ago • 40 comments

See discussion in https://github.com/marshmallow-code/marshmallow/issues/493#issuecomment-380468027.

We have two ways to do the same thing. They sometimes behave differently, which generally means one of them is buggy.

IMHO, and from what I have seen from users around me, it is more natural for a new user to write List(Nested) than Nested(many). Besides, it is consistent with Dict(values=Nested).

I realize this is quite a breaking change, both from user perspective, because Nested(many=True) is probably the most commonly used form, and from developer perspective, because it involves a lot of code changes (but also a lot of code removal, which is nice).

There could be things that can be done with Nested(many=True) but not with List(Nested), because in the latter case, the nested Schema has access to the whole data, while in the former, it is called on each element. Perhaps some pre_load edge cases might suffer from the change. Any real-life example, anyone?

Anyway, since the question was raised in a comment in another issue, I figured I'd give it more visibility.

lafrech avatar Apr 19 '18 16:04 lafrech

Is there any documentation outlining the differences between these two? I find the distinction confusing as well.

seansfkelley avatar May 14 '18 20:05 seansfkelley

I don't think there is supposed to be a difference.

deckar01 avatar May 14 '18 20:05 deckar01

I think deprecating Nested(many=True) is a good idea. Would someone be up for doing an analysis of the current differences between the two APIs?

sloria avatar May 30 '18 15:05 sloria

Do you mean deprecate in 2.X and remove in 3 or deprecate in 3?

Here are a few differences:

lafrech avatar May 30 '18 19:05 lafrech

@lafrech I was thinking of deprecating in 3. As you pointed out, Nested(many=True) is the probably the more common usage, and removing it would break many users' code.

sloria avatar May 30 '18 19:05 sloria

How would a self-referencing schema work with this? In other words, how would we express

friends = Nested('self', many=True, exclude=('friends',))

sloria avatar Jun 11 '18 17:06 sloria

I would expect friends = List(Nested('self', exclude=('friends',))) to work the same way, but it currently errors. Using a reference to the schema with python 3 works, so I suspect this is just an edge case that hasn't been tested.

deckar01 avatar Jun 11 '18 19:06 deckar01

Perhaps some pre_load edge cases might suffer from the change. Any real-life example, anyone?

I've got production code that relies on the fact that a nested field with many=True and @pre_dump(pass_many=True) on its schema will execute this pre_dump only once.

The benefit is making sure expensive code that is needed to be executed before nested fields can be dumped is not repeated for each element in the fields.List.

I don't really see this as an edge case though, pretty sure doing something to the whole set of related objects before dumping/loading them is a common pattern.

ex5 avatar Aug 13 '18 12:08 ex5

Currently a list of nested fields does not propagate the schema context correctly, but nested many does. See #424 and #922.

deckar01 avatar Aug 22 '18 15:08 deckar01

Yep, I just updated the list above.

lafrech avatar Aug 22 '18 15:08 lafrech

I saw the ref on the issue, but I couldn't figure out where it came from. Thanks.

deckar01 avatar Aug 22 '18 15:08 deckar01

@anka-sirota yes it seems like a common use case, but it very well could be done on parent schema or by subclassing List and doing custom pre_dump there. All that is possible without handling a separate many case, which I would I agree with @lafrech, just source of inconsistent in the library.

Looking at bigger picture, that this issue already have been here for couple months now, and there still few bugs before many=True can be complete eliminated from the library, I would suggest just calling it deprecated in marshmallow 3 release. Such move will:

  • prevent further delay of marshmallow 3
  • make it so that there is more interest in List(Nested) and that it works consistently with Nested(many=True)

rooterkyberian avatar Aug 31 '18 16:08 rooterkyberian

@lafrech another difference to be added to the list https://github.com/marshmallow-code/marshmallow/issues/946

rooterkyberian avatar Sep 11 '18 13:09 rooterkyberian

@rooterkyberian, I was just searching for this thread to reference your issue.

lafrech avatar Sep 11 '18 13:09 lafrech

All differences listed above are either fixed or on their way to be fixed. Once this is done, we should be able to deprecate Nested(many=True).

lafrech avatar Dec 04 '18 10:12 lafrech

There is also a difference between Nested(Schema, many=True) and List(Nested(Schema)) when loading partial data. Below tests are showing the difference in behaviour when using both methods:

from marshmallow import Schema
from marshmallow.fields import List, Nested, Str


class Employee(Schema):
    name = Str(required=True)
    title = Str(required=True)


def test_nested_partial_load_with_list():
    class Company(Schema):
        employees = List(Nested(Employee))

    payload = {
        'employees': [{
            'name': 'Arthur',
        }],
    }
    # Partial loading with list type shouldn't generate any errors.
    result = Company().load(payload, partial=True)
    assert result['employees'][0]['name'] == 'Arthur'


def test_nested_partial_load_with_nested_many():
    class Company(Schema):
        employees = Nested(Employee, many=True)

    payload = {
        'employees': [{
            'name': 'Arthur',
        }],
    }
    # Partial loading with Nested(many=True) type shouldn't generate any errors.
    result = Company().load(payload, partial=True)
    assert result['employees'][0]['name'] == 'Arthur'

Partial loading in nested schemas was introduced in version 3.0.0rc1 in this PR: #849. Problem is that when a field is of type List, it is not getting the same kwargs as Nested so it can't ignore missing data. I think it was even mentioned in referenced discussion.

bartelemi avatar Dec 29 '18 15:12 bartelemi

Thans @0xpsilocybe, I think this is the object of @sloria's comment in #1066. I shall update the PR to propagate all needed kwargs to Nested.

lafrech avatar Dec 29 '18 15:12 lafrech

What about the Pluck(many=True) use case ?

timc13 avatar Mar 25 '19 19:03 timc13

Removing many would allow a few lines of explicit many handling to be removed from Pluck. The rest of the changes would automatically be inherited from Nested.

deckar01 avatar Mar 27 '19 14:03 deckar01

@lafrech Is there anything else to be addressed before we can deprecate Nested(many=True)? I think we'll want to investigate https://github.com/marshmallow-code/marshmallow/issues/779#issuecomment-396354987 .

In any case, I don't think we should block on this for releasing 3.0. We can always deprecate within the 3.x line.

sloria avatar Jun 15 '19 21:06 sloria

@sloria :+1: on depreciating stuff rather than stacking the deck with more BC breaks. I'd much rather have 8 releases with 8 breaking changes than 8 breaking changes in one release...

I don't know about anyone else, but we've got hundreds - maybe even a thousand schemas to migrate. I'd rather take that in strides!

fuhrysteve avatar Jun 17 '19 14:06 fuhrysteve

@fuhrysteve, about this, please see the conversation in https://github.com/marshmallow-code/marshmallow/issues/928.

lafrech avatar Jun 17 '19 14:06 lafrech

Changed the milestone on this since there are still further considerations to discuss (deprecating Nested(many=True)), but I don't think there's any action items to complete before 3.0 final.

sloria avatar Jul 07 '19 18:07 sloria

Indeed, deprecation can be done in 3.x so it doesn't have to hold 3.0.

I sent https://github.com/marshmallow-code/marshmallow/pull/1290 with a fix for that last List(Nested(...)) glitch we identified above (https://github.com/marshmallow-code/marshmallow/issues/779#issuecomment-396354987).

lafrech avatar Jul 08 '19 08:07 lafrech

Before we officially recommend List(Nested(Schema)), it would be good to do some benchmarking and profiling. A quick-n-dirty benchmark shows that List(Nested(Schema)) is ~35% slower than Nested(Schema, many=True): https://gist.github.com/sloria/2e0b141b3d05fe0dafbdeafd294ebae8

sloria avatar Jul 08 '19 20:07 sloria

I might try to dig into a perf dump and see where the time is going. That seems like a lot of time to just be caused by an extra layer of indirection. I wonder if Nested isn't doing something it should be or if List is doing something it doesn't need.

deckar01 avatar Jul 15 '19 16:07 deckar01

I ran the benchmarks above using python 3.7.2 against the dev branch. I had to make a copy of Schema to get sane cProfile output, because recursion merges calls into a single total and hides the problem.

Nested(many)

Screen Shot 2019-07-16 at 11 03 23 AM Screen Shot 2019-07-16 at 11 04 30 AM

List(Nested)

Screen Shot 2019-07-16 at 11 03 42 AM Screen Shot 2019-07-16 at 11 04 44 AM

It looks like the slowdown is caused by Nested(many) calling schema.dump(many) once and getting to loop on schema._serialize(), but List(Nested) having to loop on schema.dump().

About 20% of that slowdown is caused by calls to _has_processors() which is cacheable and ErrorStore() which could possibly be removed or refactored as a shared instance. Optimizing these would also make schema.dump() calls slightly faster in general.

The rest of the slow down seems to be memory allocation and conditional evaluation in dump(). The easiest way to fix that is probably to probe the list's inner field type and allow it to hit the optimized schema path once via Nested._serialize(many).

deckar01 avatar Jul 16 '19 16:07 deckar01

#1304 makes dump() ~18% faster for both paths. #1306 makes dump() ~21% faster for List(Nested), which makes the two paths run in effectively the same amount of time (< 1%).

deckar01 avatar Jul 17 '19 19:07 deckar01

Great.

Is there the same kind of gap between Nested(many) and List(Nested) when loading?

lafrech avatar Jul 17 '19 19:07 lafrech

Ya, load()has the same ~35% slowdown when comparing List(Nested) to Nested(many). The same strategy should work there.

Edit: I added the fix for load() in #1306 and confirmed that it makes them run in the same time without slowing down Nested(many).

deckar01 avatar Jul 17 '19 19:07 deckar01