nautobot-app-ssot Prototype of `bulk

Sep 25 '24 11:09 abates

Anyone have any thoughts on this approach? @jdrew82 @Kircheneer

Sep 25 '24 14:09 abates

Is the intent here to improve performance? Do you have any benchmarks if so?

Sep 25 '24 16:09 glennmatthews

Is the intent here to improve performance? Do you have any benchmarks if so?

It is indeed an attempt to improve performance. We have an SSoT sync that pulls in ~ 80k locations (5 levels) with the majority of the locations (about 65k) at the lowest level (furthest from the root). With just normal contrib models and validated_save it takes about 72 hours to do an initial sync. With the bulk create approach it takes between 10 and 15 minutes.

Sep 25 '24 17:09 abates

I understand that a similar bulk_delete operation could be defined too. isn't it?

Sep 26 '24 07:09 chadell

I understand that a similar bulk_delete operation could be defined too. isn't it?

I believe we can expand this work for bulk updates and deletes.

Sep 26 '24 12:09 abates

I understand that a similar bulk_delete operation could be defined too. isn't it?

I believe we can expand this work for bulk updates and deletes.

maybe bulk_update would be harder as I imagine every item will have different data to update.

Sep 26 '24 12:09 chadell

@glennmatthews @jdrew82 @Kircheneer I'd like to proceed with a cleaner prototype with knobs to enable/disable/tweak this functionality. My question to you is:

Should bulk methods be part of the adapter, or part of the model? I think we will have to use the adapter in some way in order to have a place to store the list of objects needing bulk create/update/delete etc, but the entry point could be class methods like we already have in the models.

Sep 26 '24 19:09 abates

I can sort of visualize the idea of each model class storing a list of objects to be bulk processed, rather than storing those on the adapter, but that probably gets tricky when you have parent/child relationships added to the mix?

If we implement this via model methods (which conceptually feels preferable to me), would we still need a final pass in the adapter to handle any leftovers at the end?

Sep 26 '24 19:09 glennmatthews

@glennmatthews @jdrew82 @Kircheneer

I'd like to proceed with a cleaner prototype with knobs to enable/disable/tweak this functionality. My question to you is:

Should bulk methods be part of the adapter, or part of the model? I think we will have to use the adapter in some way in order to have a place to store the list of objects needing bulk create/update/delete etc, but the entry point could be class methods like we already have in the models.

When I've done it the list is on the Nautobot adapter as the bulk operations take place in sync_complete().

Sep 26 '24 19:09 jdrew82

I can sort of visualize the idea of each model class storing a list of objects to be bulk processed, rather than storing those on the adapter, but that probably gets tricky when you have parent/child relationships added to the mix?

I haven't even considered parent/child relationships, so I need to test that.

If we implement this via model methods (which conceptually feels preferable to me), would we still need a final pass in the adapter to handle any leftovers at the end?

I agree that implementing this in the model class feels cleaner to me, but you hit the nail on the head about a final pass. The nice thing with putting everything in the Adapter is we can leverage sync_complete. Perhaps we can still leverage that method and simply iterate the models.

Sep 26 '24 19:09 abates

I've made some updates and included tests as well. I want to tackle bulk deletes now.

As I'm working on this a concern has come to my mind. Diffsync really looks at things from a sequential perspective. The base models all have create/update/delete methods that actually change the state of the in-memory pydantic object. The bulk operations all deal with the django object. When a django object is added to the bulk operation queue, the pydantic base model method (create/update/delete) is called before the actual data is updated in the database. This includes calling the clean method. This means that one of these operations could fail when the bulk changes are committed, but the diffsync model has already been marked as successful.

I'm not sure the best way to approach solving this problem.

Oct 17 '24 16:10 abates

@jdrew82 @Kircheneer could you incorporate this proposal?

Dec 05 '24 05:12 chadell

Surely at some point, yes. Definitely a good idea.

Dec 05 '24 07:12 Kircheneer

Yeah, I think adding the bulk_create functionality for sure makes sense but I'm a bit unsure of how we'd want to do the delete. @abates makes a good point in that we need to delete objects in a particular order to ensure you don't have issues with cascading dependent objects but how you do that without knowing the models ahead of time is difficult to try to resolve. Maybe just doing the top_level in reverse?

Dec 05 '24 14:12 jdrew82

Prototype of `bulk_create`