django-celery-monitor icon indicating copy to clipboard operation
django-celery-monitor copied to clipboard

Save args, kwargs with JSON endcoding

Open fredley opened this issue 7 years ago • 8 comments

Currently, calling a task with arguments and kwarguments results in something like this:

>>> task.args
[True]
>>> task.kwargs
{'arg1': 'some_string', 'arg2': False}

This is a bit of a pain when parsing this information, in particular to send it to a Javascript frontend, since it's almost JSON but not quite.

Could there be a setting to enable saving this information as properly encoded JSON? e.g.

>>> task.args
[true]
>>> task.kwargs
{"arg1": "some_string", "arg2": false}

fredley avatar Jan 31 '18 09:01 fredley

I just came across this, and I think args, kwargs and result should all be saved as JSON-encoded strings. I accept that some data cannot be properly handled, for those just stringify anything that causes json.JSONEncoder.default() to raise a TypeError.

I'd be happy to submit a PR for this if deemed acceptable.

ShaheedHaque avatar Feb 14 '18 13:02 ShaheedHaque

@ShaheedHaque That sounds fantastic to me if you were happy to do that. I'm not a project maintainer though, so would be good to hear from @jezdez if this was something he would merge. It might be good to add a flag so that anyone's existing parsing of these values is not borken.

fredley avatar Feb 14 '18 13:02 fredley

Indeed (and actually, I guess there is a conversation to be had about results, since AFAIK, that is allowed to be a non-JSON value, e.g. a bare int).

ShaheedHaque avatar Feb 14 '18 13:02 ShaheedHaque

Task arguments can be any valid Python type and will only be serialized with the configured task serializer when sent between Celery clients and workers.

The goal of this package is to use the task and worker event state to conduct monitoring, which in turn provides its values verbatim without serialization -- by design. So in other words we're erring on the side of correctness instead of convenience. There is also an additional operational risk of converting the arguments to JSON during storing that could lead to monitoring race conditions if for example the conversion to JSON fails and prevents updating the task state in the database.

There are a few options to get what you want nevertheless (with the caveat that you'd be on your own):

  • subclass django_celery_monitor.camera.Camera, override the update_task method (and calling the parent update_task method first to continue the usual functionality) and store the arguments in JSON (or whatever form is convenient for you) in a separate datastorage (e.g. a separate data model)
  • we add a Django signal to this package (e.g. celery_task_monitored) so you can do option 1 without subclassing, the rest stays the same
  • post-process task state updates using Django's post_save signal and convert the arguments to the format you require, and store it in a separate table

jezdez avatar Feb 14 '18 15:02 jezdez

Thanks for the quick response. Is there a way to know, for a given event, what serializer was used? I don't see a content_type field in the model, for example?

ShaheedHaque avatar Feb 14 '18 18:02 ShaheedHaque

@jezdez I just added this debug into camera.py:

 @@ -85,6 +85,7 @@
                  (task.worker.hostname, task.worker),
              )
  
 +        logger.warning('type(task.kwargs)={}: {}'.format(type(task.kwargs), task.kwargs))
          defaults = {
              'name': task.name,
              'args': task.args,

And the resulting debug indicates that kwargs has, IIUC, already been coerced into a string even before being written to the TextField in the database:

2018-02-14 20:05:12,823 [WARNING] django_celery_monitor.camera: type(task.kwargs)=<class 'str'>: {'client': 8, 'company': 3, 'frequency': 'w1', 'next_T': '2018-10-07'} 2018-02-14 20:05:14,850 [WARNING] django_celery_monitor.camera: type(task.kwargs)=<class 'NoneType'>: None 2018-02-14 20:05:14,854 [WARNING] django_celery_monitor.camera: type(task.kwargs)=<class 'str'>: {'client': 8, 'company': 3, 'frequency': 'w1', 'next_T': '2018-10-07'} 2018-02-14 20:05:14,881 [WARNING] django_celery_monitor.camera: type(task.kwargs)=<class 'NoneType'>: None

Given that kwargs definitely started life as a dict, and what you confirmed about the intent being to use a loss-less on the wire format, this suggests that some unexpected string coercion is going on, right?

Also, given that the value is being stored in a database TextField, are we certain that the stored value would not be reduced to a string by virtue of being stored like this?

ShaheedHaque avatar Feb 14 '18 20:02 ShaheedHaque

I've come across this before I think. The camera receives args and kwargs already coerced into a string, so your options are to parse JSON-ish django string repr to actual JSON (what I'm doing at the moment), or change celery presumably quite fundamentally somewhere else so that the values arrive in the camera as JSON-encoded strings to begin with.

fredley avatar Feb 15 '18 10:02 fredley

Yes, that's what I've concluded/done too. Maybe close the issue?

ShaheedHaque avatar Feb 15 '18 10:02 ShaheedHaque