pylint-django
pylint-django copied to clipboard
[Question] Hints on best practices / what to avoid
Greetings,
I have been a user of Pylint-Django for about a year and I really appreciate how this tool gives me a hand when it comes to code quality.
I wanted to ask if it would be possible to implement some suggestions on best practices and how to avoid some common pitfalls that may come from being in a rush or just plain carelessness (I am guilty of both).
For instance:
- Creating/updating models inside a for loop could have bad performance, instead
bulk_create
/bulk_update
could work better; - Querying models inside a for loop should be avoided, a single query using the
__in=
operator could be better performing.
This are a couple examples on top of my head that gave me a bit of a headache when debugging performance issues on a software I was coding. Having a tool remind me about these possible performance hogs could have saved me a lot of effort.
Thank you for your attention.
At the top of my head I think these are definitely doable from a technical POV. However I think most if the maintainers and contributors are quite busy ATM to be able to devote significant time to new features. Although the patterns seem relatively simple implementation and managing tests & corner cases (in order to actually have confidence that it works) become non-trivial.
However you may find the following resources helpful and give it a try yourself:
- https://www.youtube.com/watch?v=3CkSKUNMLJc
- https://github.com/kiwitcms/Kiwi/tree/master/kiwi_lint
From what I can see your pattern is:
For 1) for loop statement, then calling .save()
inside of it - both easy to match. Bonus points for checking if the variable we call .save() on inherits from Model.
For 2) For loop + QuerySet - that's a bit tricky b/c there are many methods which return a QuerySet and/or it could be coming in outside the loop as a variable, result of a function, etc. A hack could be to search for .filter
indiscriminately inside the loop body.
Thanks for your response, no worries about being busy: it's understandable.
I will keep these suggestions in mind for a rainy day, It could be fun to delve into ASTs and write something others may be able to use.
For future reference, I'll add that the "is it a queryset" detection is a bit fuzzy - https://github.com/PyCQA/pylint-django/blob/master/pylint_django/augmentations/init.py#L550 - because as @atodorov says, it's not super easy to know for sure that it is a Queryset.
The AST parsing is all done by astroid so the docs there would help too understanding the Node class and similar.
Conceptually, "am I in a loop, am I calling .get
or .create
a lot on something that looks like a queryset" is very doable.
I'm slowly studying how PyLint checkers are written and in the meantime I got another couple of ideas while working at my day job:
- Using
queryset.all()
in a for loop may be slower than usingqueryset.iterator()
- Using
queryset.all()
in a list comprehension may be slower than usingqueryset.iterator()
Hopefully I'll be able to integrate Pylint with a "git version" of Pylint-Django without too many headaches.
Thank you all for the resources and helpful tips!