django-celery-results icon indicating copy to clipboard operation
django-celery-results copied to clipboard

task_kwargs not JSON

Open padiauj opened this issue 5 years ago • 20 comments

The values in task_kwargs are stored as a text representation of a dictionary rather than a JSON string. (fields have single quotes rather than double quotes)

padiauj avatar Sep 11 '19 19:09 padiauj

I have a PR to fix this: #78 , but not merged yet. They will wait until celery 5, because it is not backward compatible.

arnau126 avatar Feb 28 '20 11:02 arnau126

not celery 5 that will be in django-celery-results 2.0

auvipy avatar Feb 29 '20 03:02 auvipy

@arnau126 @auvipy Was this not merged into master already? I have django-celery-results installed from master but it seems this is still the case.

ZviBaratz avatar Nov 12 '20 15:11 ZviBaratz

both of you should try the new 2.0 release & share your feedback

auvipy avatar Nov 20 '20 03:11 auvipy

@ZviBaratz , you're right, not working :( And this is because of #91, which was submitted after #78 but merged first.

#91 gave kwargsrepr (which is already casted to string using repr) preference over kwargs (still a dict), so the changes I did does nothing.

I've tried to set task_protocol=1 to disable the use of kwargsrepr (https://docs.celeryproject.org/en/latest/userguide/tasks.html#hiding-sensitive-information-in-arguments), but still not working because request still has the attribute kwargsrepr (but with None value).

All of this has made me realize that #91 introduced a bug: if task_protocol=1, we are always storing null (or "None") to task_kwargs instead of the actual kwargs.

To fix this we need to change:

        task_kwargs = getattr(request,
                              'kwargsrepr', getattr(request, 'kwargs', None))

to:

        if getattr(request, 'kwargsrepr', None) is not None:
            task_kwargs = request.kwargsrepr
        else:
            task_kwargs = getattr(request, 'kwargs', None)

I will submit a PR for this (fix task_protocol=1). And I will dig into task_protocol=2 and kwargsrepr (which is the default since celery 4).

arnau126 avatar Nov 20 '20 08:11 arnau126

@ZviBaratz , the solution I've found, without changing anything from django-celery-results, is calling all tasks like this:

from celery import shared_task

@shared_task
def debug_task(x, y=0, s=True):
    if s:
        return x + y
    else:
        return x - y

args = (3, )
kwargs = {'y': 2, 's': True}
debug_task.apply_async(
    args=args, kwargs=kwarg,
    argsrepr=args, kwargsrepr=kwargs,
)

Because if you don't specify argsrepr and kwargsrepr args, celery generates them using repr. (https://github.com/celery/celery/blob/8c5e9888ae10288ae1b2113bdce6a4a41c47354b/celery/app/amqp.py#L314)

arnau126 avatar Nov 20 '20 11:11 arnau126

I've come up with some solutions:

  1. Do nothing, just always call tasks adding argsrepr=args, kwargsrepr=kwargs (keep in mind that we're duplicating data and messages have a max length...), or use task_protocol=1 instead of 2 (we're losing a bunch of improvements)

  2. Add a setting to Celery to be able to disable the argsrepr/kwargsrepr feature. Acceptable if you don't have sensitive data, and you need to log args and kwargs in json format.

  3. Add a setting to Celery to be able to specify which function do you want to use to generate argsrepr and kwargsrepr (default: the currently used saferepr), and store them to TaskResult just as they come (which means undo #78 ).

@auvipy , @thedrow how do you see it? If options 2 or 3 look good to you (or both), I will do a PR. Any comment is welcomed.

arnau126 avatar Nov 20 '20 12:11 arnau126

@arnau126 I've tried installing again from master and experimenting a bit, I still can't seem to be able to return a dict. My particular use case is with celery.starmap tasks created using a task.chunks() call, in which I need access to the underlying iterated task's name. It is included in the Task Named Arguments (under 'task'), but I can't access it conveniently.

ZviBaratz avatar Dec 29 '20 14:12 ZviBaratz

I still can't seem to be able to return a dict.

I know. It only works with task_protocol=1 which is not the default.

As I said in my previous comment, the solution is not in this package but in Celery. Is Celery who casts task_args and task_kwargs to string using repr (if task_protocol=2). So when the message arrives to DatabaseBackend they already are strings and we cannot serialize them as json.

There's nothing we can do from django-celery-results .

arnau126 avatar Jan 13 '21 11:01 arnau126

I still can't seem to be able to return a dict.

I know. It only works with task_protocol=1 which is not the default.

As I said in my previous comment, the solution is not in this package but in Celery. Is Celery who casts task_args and task_kwargs to string using repr (if task_protocol=2). So when the message arrives to DatabaseBackend they already are strings and we cannot serialize them as json.

There's nothing we can do from django-celery-results .

can you suggest where in celery this needed to be changed or atleast point to any issue?

auvipy avatar Jan 13 '21 12:01 auvipy

@auvipy

These are my suggestions:

  1. Add a setting to Celery to be able to disable the argsrepr/kwargsrepr feature. Acceptable if you don't have sensitive data, and you need to log args and kwargs in json format.

  2. Add a setting to Celery to be able to specify which function do you want to use to generate argsrepr and kwargsrepr (default: the currently used saferepr), and store them to TaskResult just as they come (which means undo #78 ).

I've never contributed to Celery, but I'll take a look. Thats why I asked for your opinion:

@auvipy , @thedrow how do you see it? If options 2 or 3 look good to you (or both), I will do a PR. Any comment is welcomed.

arnau126 avatar Jan 13 '21 12:01 arnau126

And I think that the settings I suggest should be used here: https://github.com/celery/celery/blob/8c5e9888ae10288ae1b2113bdce6a4a41c47354b/celery/app/amqp.py#L314

and in some other place like https://github.com/celery/celery/blob/7288147d65a32b726869ed887d99e4bfd8c070e2/celery/worker/strategy.py#L81, maybe...

arnau126 avatar Jan 13 '21 12:01 arnau126

I'm sorry that I'm unable to contribute in solving this (due to current lack of time and experience with celery's codebase), however, I would be happy to help in validating a fix and/or any other trivial tasks that may come up. Thank you @arnau126 and @auvipy for taking this on, IMO this is an essential feature of django-celery-results.

ZviBaratz avatar Jan 13 '21 13:01 ZviBaratz

@auvipy , @ZviBaratz I've done this PR: https://github.com/celery/celery/pull/6595 Not tested. Just to show my suggestion, and get feedback from you and other users.

arnau126 avatar Jan 13 '21 16:01 arnau126

In case this helps anyone - Here's a Django Admin Action I've created to Retry Celery Tasks directly from Django Admin - https://gist.github.com/sameerkumar18/e792fa6026e063eaef695b3c271959e6 In simple terms, this overrides Django-celery-results' Admin registration to add an Admin action for retrying tasks.

sameerkumar18 avatar Dec 16 '21 21:12 sameerkumar18

Is this issue fixed? I am running into issues with kwargs of the form:

some_task.apply_async(kwargs={'run_as_task': True, ...})  # or similar via send_task, delay etc

This gets stored as the string "{'run_as_task': True}". which cannot be json decoded (apart from the quotes issue) due to True not being a valid JSON identifier.

sdrabblescripta avatar Dec 05 '23 16:12 sdrabblescripta

difference between v1 and v2, make it much weirder... in v1, i can use eval_literal directly, in v2, it become a json string of py's representation?? image

Pandaaaa906 avatar Feb 01 '24 10:02 Pandaaaa906

Is there a working fix by now? This issue is open for over 3 years now.

My current workarround is to use this custom function whenever I need get my task_kwargs:

import json
from django_celery_results.models import TaskResult


def get_taskresult_kwargs_as_dict(instance: TaskResult) -> dict:
    """
    Convert the task_kwargs field of a TaskResult instance to a dictionary/json object
    :param instance: TaskResult
    :return: dict
    """

    args = json.loads((instance.task_kwargs))
    if isinstance(args, str):
        args = args.replace("'", '"')
        args = json.loads(args)
    return args

args = get_taskresult_kwargs_as_dict(instance)
args['my_key']

kimdre avatar Feb 12 '24 20:02 kimdre

Is there a working fix by now? This issue is open for over 3 years now.

My current workarround is to use this custom function whenever I need get my task_kwargs:

import json
from django_celery_results.models import TaskResult


def get_taskresult_kwargs_as_dict(instance: TaskResult) -> dict:
    """
    Convert the task_kwargs field of a TaskResult instance to a dictionary/json object
    :param instance: TaskResult
    :return: dict
    """

    args = json.loads((instance.task_kwargs))
    if isinstance(args, str):
        args = args.replace("'", '"')
        args = json.loads(args)
    return args

args = get_taskresult_kwargs_as_dict(instance)
args['my_key']

Thank you for this. I was manually working through this when I found your code. I've modified it slightly to help with the problem @sdrabblescripta mentioned with 'True' values also breaking json.loads().

I replaced args = args.replace("'", '"') with args = args.replace("'", '"').replace(" True,", ' "True",').

programmylife avatar Apr 17 '24 15:04 programmylife