django-q icon indicating copy to clipboard operation
django-q copied to clipboard

Indexes in the table django_q_schedule

Open prd-hai-huynh opened this issue 3 years ago • 2 comments

Hi,

We're getting bad performance when the below query is executed every 30s. Could you please tell me if we can have any indexes in the table django_q_schedule? (Because the query is executed inside django-q, so we have no choice but to make indexes.)

Query: https://github.com/Koed00/django-q/blob/master/django_q/cluster.py#L589-L594

Schedule.objects.select_for_update()
                .exclude(repeats=0)
                .filter(next_run__lt=timezone.now())
                .filter(
                    db.models.Q(cluster__isnull=True) | db.models.Q(cluster=Conf.PREFIX)
                )

Raw query:

SELECT 
    `django_q_schedule`.`id`,
    `django_q_schedule`.`name`,
    `django_q_schedule`.`func`,
    `django_q_schedule`.`hook`,
    `django_q_schedule`.`args`,
    `django_q_schedule`.`kwargs`,
    `django_q_schedule`.`schedule_type`,
    `django_q_schedule`.`minutes`,
    `django_q_schedule`.`repeats`,
    `django_q_schedule`.`next_run`,
    `django_q_schedule`.`task`
FROM
    `django_q_schedule`
WHERE
    (NOT (`django_q_schedule`.`repeats` = 0)
        AND `django_q_schedule`.`next_run` < '2021-10-28 16:35:26.460229')
ORDER BY `django_q_schedule`.`next_run` ASC
FOR UPDATE

The number of records of our database is increasing rapidly and the query causes bad performance to RDS:

mysql> select count(1) from django_q_schedule;
+----------+
| count(1) |
+----------+
|   341107 |
+----------+
1 row in set (0.36 sec)

Our django-q configuration:

QConfiguration = {
    "name": "name",
    "workers": NUMBER_OF_WORKERS,
    "recycle": 500,
    "timeout": 600,
    "compress": True,
    "save_limit": 250,
    "queue_limit": 500,
    "cpu_affinity": 1,
    "label": "Django Q",
    "retry": 10000,
    "redis": {
        "host": REDIS_HOST,
        "port": REDIS_PORT,
        "db": 0,
        "password": REDIS_PWD,
        "ssl": True,
    },
}

django-q version:

django-q==1.2.1

Thank you very much,

prd-hai-huynh avatar Oct 28 '21 18:10 prd-hai-huynh

Could you please tell me if we can have any indexes in the table django_q_schedule?

I don't believe so, other than an index on the primary key. Here's what SHOW INDEX shows on my system:

mysql> SHOW INDEX FROM django_q_schedule;
+-------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table             | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| django_q_schedule |          0 | PRIMARY  |            1 | id          | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
+-------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)

The number of records of our database is increasing rapidly and the query causes bad performance to RDS:

I'm surprised that you have so many entries in the schedule table. How are you using the schedule feature?

nickodell avatar Oct 28 '21 19:10 nickodell

I'm surprised that you have so many entries in the schedule table. How are you using the schedule feature?

@nickodell,

  1. We're working on a large number of records(let's say [item]),
  2. each item requires a schedule with a different next_run value.
  3. Some require to run ONCE, some require to run DAILY, some need to run today but some need to run in the future(maybe for the next year)
  4. The number of [item] is increasing rapidly over time, that is why we have a large number of records in django_q_schedule.

prd-hai-huynh avatar Oct 28 '21 20:10 prd-hai-huynh