django-q
django-q copied to clipboard
Indexes in the table django_q_schedule
Hi,
We're getting bad performance when the below query is executed every 30s.
Could you please tell me if we can have any indexes in the table django_q_schedule?
(Because the query is executed inside django-q, so we have no choice but to make indexes.)
Query: https://github.com/Koed00/django-q/blob/master/django_q/cluster.py#L589-L594
Schedule.objects.select_for_update()
.exclude(repeats=0)
.filter(next_run__lt=timezone.now())
.filter(
db.models.Q(cluster__isnull=True) | db.models.Q(cluster=Conf.PREFIX)
)
Raw query:
SELECT
`django_q_schedule`.`id`,
`django_q_schedule`.`name`,
`django_q_schedule`.`func`,
`django_q_schedule`.`hook`,
`django_q_schedule`.`args`,
`django_q_schedule`.`kwargs`,
`django_q_schedule`.`schedule_type`,
`django_q_schedule`.`minutes`,
`django_q_schedule`.`repeats`,
`django_q_schedule`.`next_run`,
`django_q_schedule`.`task`
FROM
`django_q_schedule`
WHERE
(NOT (`django_q_schedule`.`repeats` = 0)
AND `django_q_schedule`.`next_run` < '2021-10-28 16:35:26.460229')
ORDER BY `django_q_schedule`.`next_run` ASC
FOR UPDATE
The number of records of our database is increasing rapidly and the query causes bad performance to RDS:
mysql> select count(1) from django_q_schedule;
+----------+
| count(1) |
+----------+
| 341107 |
+----------+
1 row in set (0.36 sec)
Our django-q configuration:
QConfiguration = {
"name": "name",
"workers": NUMBER_OF_WORKERS,
"recycle": 500,
"timeout": 600,
"compress": True,
"save_limit": 250,
"queue_limit": 500,
"cpu_affinity": 1,
"label": "Django Q",
"retry": 10000,
"redis": {
"host": REDIS_HOST,
"port": REDIS_PORT,
"db": 0,
"password": REDIS_PWD,
"ssl": True,
},
}
django-q version:
django-q==1.2.1
Thank you very much,
Could you please tell me if we can have any indexes in the table django_q_schedule?
I don't believe so, other than an index on the primary key. Here's what SHOW INDEX shows on my system:
mysql> SHOW INDEX FROM django_q_schedule;
+-------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| django_q_schedule | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | |
+-------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)
The number of records of our database is increasing rapidly and the query causes bad performance to RDS:
I'm surprised that you have so many entries in the schedule table. How are you using the schedule feature?
I'm surprised that you have so many entries in the schedule table. How are you using the schedule feature?
@nickodell,
- We're working on a large number of records(let's say [item]),
- each item requires a schedule with a different next_run value.
- Some require to run ONCE, some require to run DAILY, some need to run today but some need to run in the future(maybe for the next year)
- The number of [item] is increasing rapidly over time, that is why we have a large number of records in
django_q_schedule.