nautobot-app-ssot icon indicating copy to clipboard operation
nautobot-app-ssot copied to clipboard

Prototype of `bulk_create`

Open abates opened this issue 1 year ago • 10 comments

abates avatar Sep 25 '24 11:09 abates

Anyone have any thoughts on this approach? @jdrew82 @Kircheneer

abates avatar Sep 25 '24 14:09 abates

Is the intent here to improve performance? Do you have any benchmarks if so?

glennmatthews avatar Sep 25 '24 16:09 glennmatthews

Is the intent here to improve performance? Do you have any benchmarks if so?

It is indeed an attempt to improve performance. We have an SSoT sync that pulls in ~ 80k locations (5 levels) with the majority of the locations (about 65k) at the lowest level (furthest from the root). With just normal contrib models and validated_save it takes about 72 hours to do an initial sync. With the bulk create approach it takes between 10 and 15 minutes.

abates avatar Sep 25 '24 17:09 abates

I understand that a similar bulk_delete operation could be defined too. isn't it?

chadell avatar Sep 26 '24 07:09 chadell

I understand that a similar bulk_delete operation could be defined too. isn't it?

I believe we can expand this work for bulk updates and deletes.

abates avatar Sep 26 '24 12:09 abates

I understand that a similar bulk_delete operation could be defined too. isn't it?

I believe we can expand this work for bulk updates and deletes.

maybe bulk_update would be harder as I imagine every item will have different data to update.

chadell avatar Sep 26 '24 12:09 chadell

@glennmatthews @jdrew82 @Kircheneer I'd like to proceed with a cleaner prototype with knobs to enable/disable/tweak this functionality. My question to you is:

Should bulk methods be part of the adapter, or part of the model? I think we will have to use the adapter in some way in order to have a place to store the list of objects needing bulk create/update/delete etc, but the entry point could be class methods like we already have in the models.

abates avatar Sep 26 '24 19:09 abates

I can sort of visualize the idea of each model class storing a list of objects to be bulk processed, rather than storing those on the adapter, but that probably gets tricky when you have parent/child relationships added to the mix?

If we implement this via model methods (which conceptually feels preferable to me), would we still need a final pass in the adapter to handle any leftovers at the end?

glennmatthews avatar Sep 26 '24 19:09 glennmatthews

@glennmatthews @jdrew82 @Kircheneer

I'd like to proceed with a cleaner prototype with knobs to enable/disable/tweak this functionality. My question to you is:

Should bulk methods be part of the adapter, or part of the model? I think we will have to use the adapter in some way in order to have a place to store the list of objects needing bulk create/update/delete etc, but the entry point could be class methods like we already have in the models.

When I've done it the list is on the Nautobot adapter as the bulk operations take place in sync_complete().

jdrew82 avatar Sep 26 '24 19:09 jdrew82

I can sort of visualize the idea of each model class storing a list of objects to be bulk processed, rather than storing those on the adapter, but that probably gets tricky when you have parent/child relationships added to the mix?

I haven't even considered parent/child relationships, so I need to test that.

If we implement this via model methods (which conceptually feels preferable to me), would we still need a final pass in the adapter to handle any leftovers at the end?

I agree that implementing this in the model class feels cleaner to me, but you hit the nail on the head about a final pass. The nice thing with putting everything in the Adapter is we can leverage sync_complete. Perhaps we can still leverage that method and simply iterate the models.

abates avatar Sep 26 '24 19:09 abates

I've made some updates and included tests as well. I want to tackle bulk deletes now.

As I'm working on this a concern has come to my mind. Diffsync really looks at things from a sequential perspective. The base models all have create/update/delete methods that actually change the state of the in-memory pydantic object. The bulk operations all deal with the django object. When a django object is added to the bulk operation queue, the pydantic base model method (create/update/delete) is called before the actual data is updated in the database. This includes calling the clean method. This means that one of these operations could fail when the bulk changes are committed, but the diffsync model has already been marked as successful.

I'm not sure the best way to approach solving this problem.

abates avatar Oct 17 '24 16:10 abates

@jdrew82 @Kircheneer could you incorporate this proposal?

chadell avatar Dec 05 '24 05:12 chadell

Surely at some point, yes. Definitely a good idea.

Kircheneer avatar Dec 05 '24 07:12 Kircheneer

Yeah, I think adding the bulk_create functionality for sure makes sense but I'm a bit unsure of how we'd want to do the delete. @abates makes a good point in that we need to delete objects in a particular order to ensure you don't have issues with cascading dependent objects but how you do that without knowing the models ahead of time is difficult to try to resolve. Maybe just doing the top_level in reverse?

jdrew82 avatar Dec 05 '24 14:12 jdrew82