django-dag
django-dag copied to clipboard
query optimization
This isn't a bug or problem per se- but while doing some query optimizations in my project, I realized that the descendants_tree() (and I assume ancestor_tree and ans_set/des_set) command(s) uses a large number of queries to complete. I'm wondering if there is an easy way to cut down the number of queries for the nested structure. I had something in mind like a select_related() query that could at least cut down the number of db hits. For now, the rest of my project is too important to overlook, so it may be a while until I can get back to it and work on a pull request. Thanks for this project!
I agree, there could be a problem with large datasets and deep trees.
I'm not going to fix it right now, but using a different approach (something like MPTT [1] does) might worth considering.
[1] https://github.com/django-mptt/django-mptt/
Alright. I'll keep you updated if I come up with any improvements. This project is working great for my needs, thanks for your work.
http://www.postgresql.org/docs/8.4/static/queries-with.html is an approach which would be very useful for this, and shouldn't be too hard to adapt for. However, it's not going to be portable -- MySQL and Sqlite don't support this query unless I'm mistaken.
I'm about to begin work on this. Given the non-portability of this, I'm thinking of forking this and renaming it (django-cte-dag?) so that django-dag users who need to deal with deep graphs can benefit from Postgresql's recursive queries, while django-dag can continue to serve users of other databases with simpler needs.
It's worth noting that for small to medium-sized graphs you could use transitive closures, and there's a library out there called django-directed-acyclic-graph which does that.
Hi @elpaso, thanks for this package. Solves my use case with slight modifications. I want it to be scalable upto ~100K node and edges. I just read your comment -
I'm not going to fix it right now, but using a different approach (something like MPTT [1] does) might worth considering.
I am curious to know more about your approach.
Also, did anyone else work on optimizing queries?( I'm using MySQL database.)
@a1Gupta Hi, I'm not actively working on this project anymore, but I'll be happy to review any PR, provided that it applies to general use cases of the library.
@elpaso Hi, I am trying to find an efficient way to do it. I've added a post on reddit for the same. You can also add your ideas there.