django-auth-adfs icon indicating copy to clipboard operation
django-auth-adfs copied to clipboard

Optimize update_user_groups to a fixed number of DB queries.

Open tim-schilling opened this issue 3 years ago • 6 comments

By using Django's bulk_create and set operations we can reduce the time complexity of this function from two sorts to none and from 3 + (2ish * (claim_groups - existing _group_names) OR (claim_groups - existing _group_names) ) DB queries to a constant 3.

The downside of this approach is that the bulk_create operation will not cause the pre_save and post_save signals to be raised when creating the Group instances.

tim-schilling avatar Feb 08 '22 22:02 tim-schilling

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Feb 08 '22 22:02 sonarqubecloud[bot]

Codecov Report

Merging #220 (0914b94) into master (a2e9b37) will decrease coverage by 0.2%. The diff coverage is 70.0%.

@@           Coverage Diff            @@
##           master    #220     +/-   ##
========================================
- Coverage    85.6%   85.3%   -0.3%     
========================================
  Files          11      11             
  Lines         500     492      -8     
========================================
- Hits          428     420      -8     
  Misses         72      72             
Impacted Files Coverage Δ
django_auth_adfs/backend.py 84.3% <70.0%> (-0.8%) :arrow_down:

codecov[bot] avatar Feb 08 '22 22:02 codecov[bot]

Hi @tim-schilling! I'll look into this later today. Thank you 🚀

As a first thought I don't mind those signals not being kicked off, but I think the discussion of a major version bump should be had. What do you think?

Personally I don't believe the group creation signal is mentioned anywhere and not really officially supported, but I'd like to hear opinions. Efficiency on these things is a far more important concern to me.

CC @sondrelg

JonasKs avatar Feb 09 '22 07:02 JonasKs

Hmm this is a tough one.

I doubt there are many projects that rely on signals based off of these events. At the same time, if there are, the outcome could be pretty bad for them if they upgrade and this change flies under the radar. I think if we accept the change, we should probably bump major versions.

I'd normally advocate not having too high of a bar for major bumps - I think upgrading often is good generally, but maybe it's worth asking what the performance impact of the reduced time complexity is here. Have you benchmarked it @tim-schilling? If it's negligible in practice and we lose signals, maybe it wouldn't be a net benefit to users. I'd be happy to help try test it later this week if you want 🙂

And it's been a while since I dived into the signals API, but is it possible to add these back manually? This line from the Django project seems promising. If so, perhaps that would remove the downside.

sondrelg avatar Feb 09 '22 08:02 sondrelg

As a first thought I don't mind those signals not being kicked off, but I think the discussion of a major version bump should be had. What do you think?

I think a major upgrade is reasonable. This change would be easy to miss and in the right circumstances have major impacts on an integration.

Have you benchmarked it @tim-schilling?

I have not. I have thought about it more and it does seem relatively limited. The groups claim from AD will at most have 200 elements. I'll try to put something together so we have a better idea of what it would mean.

And it's been a while since I dived into the signals API, but is it possible to add these back manually? This line from the Django project seems promising. If so, perhaps that would remove the downside.

Unfortunately I don't think that's going to be feasible. I suspect the django devs would have supported this if they could have. The problem is that we don't know which objects are being created before running bulk_create. In Django 3.2+, it returns the created instances, but even then we'd be past the point where pre_save is helpful.

I could create a configuration option that allows the signals to be raised when MIRROR_GROUPS is enabled to allow devs to choose moving forward. Or that could get added if someone finds that they wouldn't be able to upgrade because of that lack of functionality.

tim-schilling avatar Feb 09 '22 16:02 tim-schilling

I think a major upgrade is reasonable. This change would be easy to miss and in the right circumstances have major impacts on an integration.

Alright, let's do that if you want to merge this, based on your benchmark findings. :+1:

JonasKs avatar Feb 10 '22 14:02 JonasKs

I'm closing this for now. I may revisit it in the future.

tim-schilling avatar Aug 24 '22 18:08 tim-schilling