pylint-django icon indicating copy to clipboard operation
pylint-django copied to clipboard

[Question] Hints on best practices / what to avoid

Open Penaz91 opened this issue 2 years ago • 4 comments

Greetings,

I have been a user of Pylint-Django for about a year and I really appreciate how this tool gives me a hand when it comes to code quality.

I wanted to ask if it would be possible to implement some suggestions on best practices and how to avoid some common pitfalls that may come from being in a rush or just plain carelessness (I am guilty of both).

For instance:

  • Creating/updating models inside a for loop could have bad performance, instead bulk_create/bulk_update could work better;
  • Querying models inside a for loop should be avoided, a single query using the __in= operator could be better performing.

This are a couple examples on top of my head that gave me a bit of a headache when debugging performance issues on a software I was coding. Having a tool remind me about these possible performance hogs could have saved me a lot of effort.

Thank you for your attention.

Penaz91 avatar Aug 24 '22 15:08 Penaz91

At the top of my head I think these are definitely doable from a technical POV. However I think most if the maintainers and contributors are quite busy ATM to be able to devote significant time to new features. Although the patterns seem relatively simple implementation and managing tests & corner cases (in order to actually have confidence that it works) become non-trivial.

However you may find the following resources helpful and give it a try yourself:

  • https://www.youtube.com/watch?v=3CkSKUNMLJc
  • https://github.com/kiwitcms/Kiwi/tree/master/kiwi_lint

From what I can see your pattern is:

For 1) for loop statement, then calling .save() inside of it - both easy to match. Bonus points for checking if the variable we call .save() on inherits from Model.

For 2) For loop + QuerySet - that's a bit tricky b/c there are many methods which return a QuerySet and/or it could be coming in outside the loop as a variable, result of a function, etc. A hack could be to search for .filter indiscriminately inside the loop body.

atodorov avatar Aug 24 '22 19:08 atodorov

Thanks for your response, no worries about being busy: it's understandable.

I will keep these suggestions in mind for a rainy day, It could be fun to delve into ASTs and write something others may be able to use.

Penaz91 avatar Aug 25 '22 09:08 Penaz91

For future reference, I'll add that the "is it a queryset" detection is a bit fuzzy - https://github.com/PyCQA/pylint-django/blob/master/pylint_django/augmentations/init.py#L550 - because as @atodorov says, it's not super easy to know for sure that it is a Queryset.

The AST parsing is all done by astroid so the docs there would help too understanding the Node class and similar.

Conceptually, "am I in a loop, am I calling .get or .create a lot on something that looks like a queryset" is very doable.

carlio avatar Aug 25 '22 09:08 carlio

I'm slowly studying how PyLint checkers are written and in the meantime I got another couple of ideas while working at my day job:

  • Using queryset.all() in a for loop may be slower than using queryset.iterator()
  • Using queryset.all() in a list comprehension may be slower than using queryset.iterator()

Hopefully I'll be able to integrate Pylint with a "git version" of Pylint-Django without too many headaches.

Thank you all for the resources and helpful tips!

Penaz91 avatar Sep 01 '22 13:09 Penaz91