procrastinate icon indicating copy to clipboard operation
procrastinate copied to clipboard

queue names limited to 43 characters

Open ashleyheath opened this issue 1 year ago • 7 comments

It seems postgres is compiled by default with an id size of 63 bytes.

procrastinate_notify_queue() performs

PERFORM pg_notify('procrastinate_queue#' || NEW.queue_name, NEW.task_name);

that means there's a practical limit of 43 ascii characters on the queue_name.

While technically someone could build their postgres instance with a different id size in practice I expect this won't be the case and so it feels like it would be sensible to either:

  • enforce that queue values are <= 43 in length
  • skip the use of pg_notify where queue names are too long (and allow workers to pick up jobs with longer queue names exclusively through polling)

Given a UUID is 36 characters, 43 is pretty limiting if I want to create 'thing' sepecific queues (e.g. email-customer/00000000-0000-0000-0000-000000000000 so it certainly feels useful to allow the use of longer queue names but appreciate that would be a breaking change with the existing behaviour.

Keen to get thoughts from people.

ashleyheath avatar May 21 '24 13:05 ashleyheath

Alternatively, we could use a hash instead of the actual queue name. That would leave a lot of room for various values.

ewjoachim avatar May 21 '24 13:05 ewjoachim

That's certainly an interesting idea. I'd be worried that changing the queue name in the jobs table would hinder observability and debugging but if we could use a hash only only in the pg_notify publish and subscribe calls that might do the job nicely.

ashleyheath avatar May 22 '24 07:05 ashleyheath

Ah yes, of course, I'm only talking about the "notification" name.

Would you like to make a PR for that ?

>>> base64.b32encode(hashlib.sha1(b"my_queue").digest()).decode().lower()
'756gnpdwk3lna3zna4toqq4eh4dhu4r6'
>>> len(_)
32

(this gives 32^32 possibilities, so 10^48, meaning you'd have roughly the same probability of collision as choosing a random atom on the planet Earth and your friend also choosing a random atom, and you've both randomly picked the same atom)

And we should log the mapping between the queue name and the hash either at the start of the worker, or when we listen (level debug).

ewjoachim avatar May 22 '24 08:05 ewjoachim

>>> base64.b32encode(hashlib.sha1(b"my_queue").digest()).decode().lower()
'756gnpdwk3lna3zna4toqq4eh4dhu4r6'
>>> len(_)
32

@ewjoachim did you envisage doing the hash from the python code then (as opposed to from within the SQL function)? And if so, adding a new column to procrastinate_jobs to hold the hashed queue name?

ashleyheath avatar May 29 '24 10:05 ashleyheath

Ah I was thinking about the listen part where it's equal to do it in Python and in SQL. Yeah, of course, we need an SQL implementation of that for the notify part, so yeah, let's do that in SQL.

ewjoachim avatar May 29 '24 14:05 ewjoachim

Doing a little digging, it seems the digest function in postgres is actually part of the separate pgcrypto library.

So the options seem to be:

  • Require people to install the pgcrypto extension and hash the queue name on LISTEN (in python) and when calling pg_notify
  • Hash the queue name in python, add a column to record the hashed queue name

I don't have strong preferences but can imagine that some users may perfer the second due to restrictions on what they can install. @ewjoachim any thoughts?

ashleyheath avatar May 31 '24 08:05 ashleyheath

Looks like the hashtext function is available without the need for extensions. It's apparently not guaranteed to be stable with PG versions, but we don't need it to be extremely stable, nothing will break if listen/notify doesn't work during a PG migration.

ewjoachim avatar Jun 07 '24 22:06 ewjoachim