django-celery-results
django-celery-results copied to clipboard
task_kwargs not JSON
The values in task_kwargs are stored as a text representation of a dictionary rather than a JSON string. (fields have single quotes rather than double quotes)
I have a PR to fix this: #78 , but not merged yet. They will wait until celery 5, because it is not backward compatible.
not celery 5 that will be in django-celery-results 2.0
@arnau126 @auvipy
Was this not merged into master already? I have django-celery-results
installed from master
but it seems this is still the case.
both of you should try the new 2.0 release & share your feedback
@ZviBaratz , you're right, not working :( And this is because of #91, which was submitted after #78 but merged first.
#91 gave kwargsrepr
(which is already casted to string using repr
) preference over kwargs
(still a dict
), so the changes I did does nothing.
I've tried to set task_protocol=1
to disable the use of kwargsrepr
(https://docs.celeryproject.org/en/latest/userguide/tasks.html#hiding-sensitive-information-in-arguments), but still not working because request
still has the attribute kwargsrepr
(but with None
value).
All of this has made me realize that #91 introduced a bug: if task_protocol=1
, we are always storing null
(or "None"
) to task_kwargs
instead of the actual kwargs.
To fix this we need to change:
task_kwargs = getattr(request,
'kwargsrepr', getattr(request, 'kwargs', None))
to:
if getattr(request, 'kwargsrepr', None) is not None:
task_kwargs = request.kwargsrepr
else:
task_kwargs = getattr(request, 'kwargs', None)
I will submit a PR for this (fix task_protocol=1
).
And I will dig into task_protocol=2
and kwargsrepr
(which is the default since celery 4).
@ZviBaratz , the solution I've found, without changing anything from django-celery-results
, is calling all tasks like this:
from celery import shared_task
@shared_task
def debug_task(x, y=0, s=True):
if s:
return x + y
else:
return x - y
args = (3, )
kwargs = {'y': 2, 's': True}
debug_task.apply_async(
args=args, kwargs=kwarg,
argsrepr=args, kwargsrepr=kwargs,
)
Because if you don't specify argsrepr
and kwargsrepr
args, celery generates them using repr
.
(https://github.com/celery/celery/blob/8c5e9888ae10288ae1b2113bdce6a4a41c47354b/celery/app/amqp.py#L314)
I've come up with some solutions:
-
Do nothing, just always call tasks adding
argsrepr=args, kwargsrepr=kwargs
(keep in mind that we're duplicating data and messages have a max length...), or use task_protocol=1 instead of 2 (we're losing a bunch of improvements) -
Add a setting to
Celery
to be able to disable the argsrepr/kwargsrepr feature. Acceptable if you don't have sensitive data, and you need to log args and kwargs in json format. -
Add a setting to
Celery
to be able to specify which function do you want to use to generate argsrepr and kwargsrepr (default: the currently usedsaferepr
), and store them to TaskResult just as they come (which means undo #78 ).
@auvipy , @thedrow how do you see it? If options 2 or 3 look good to you (or both), I will do a PR. Any comment is welcomed.
@arnau126 I've tried installing again from master
and experimenting a bit, I still can't seem to be able to return a dict. My particular use case is with celery.starmap
tasks created using a task.chunks()
call, in which I need access to the underlying iterated task's name. It is included in the Task Named Arguments
(under 'task'), but I can't access it conveniently.
I still can't seem to be able to return a dict.
I know. It only works with task_protocol=1
which is not the default.
As I said in my previous comment, the solution is not in this package but in Celery
.
Is Celery
who casts task_args
and task_kwargs
to string using repr
(if task_protocol=2).
So when the message arrives to DatabaseBackend
they already are strings and we cannot serialize them as json.
There's nothing we can do from django-celery-results
.
I still can't seem to be able to return a dict.
I know. It only works with
task_protocol=1
which is not the default.As I said in my previous comment, the solution is not in this package but in
Celery
. IsCelery
who caststask_args
andtask_kwargs
to string usingrepr
(if task_protocol=2). So when the message arrives toDatabaseBackend
they already are strings and we cannot serialize them as json.There's nothing we can do from
django-celery-results
.
can you suggest where in celery this needed to be changed or atleast point to any issue?
@auvipy
These are my suggestions:
Add a setting to
Celery
to be able to disable the argsrepr/kwargsrepr feature. Acceptable if you don't have sensitive data, and you need to log args and kwargs in json format.Add a setting to
Celery
to be able to specify which function do you want to use to generate argsrepr and kwargsrepr (default: the currently usedsaferepr
), and store them to TaskResult just as they come (which means undo #78 ).
I've never contributed to Celery, but I'll take a look. Thats why I asked for your opinion:
@auvipy , @thedrow how do you see it? If options 2 or 3 look good to you (or both), I will do a PR. Any comment is welcomed.
And I think that the settings I suggest should be used here: https://github.com/celery/celery/blob/8c5e9888ae10288ae1b2113bdce6a4a41c47354b/celery/app/amqp.py#L314
and in some other place like https://github.com/celery/celery/blob/7288147d65a32b726869ed887d99e4bfd8c070e2/celery/worker/strategy.py#L81, maybe...
I'm sorry that I'm unable to contribute in solving this (due to current lack of time and experience with celery
's codebase), however, I would be happy to help in validating a fix and/or any other trivial tasks that may come up. Thank you @arnau126 and @auvipy for taking this on, IMO this is an essential feature of django-celery-results
.
@auvipy , @ZviBaratz I've done this PR: https://github.com/celery/celery/pull/6595 Not tested. Just to show my suggestion, and get feedback from you and other users.
In case this helps anyone - Here's a Django Admin Action I've created to Retry Celery Tasks directly from Django Admin - https://gist.github.com/sameerkumar18/e792fa6026e063eaef695b3c271959e6 In simple terms, this overrides Django-celery-results' Admin registration to add an Admin action for retrying tasks.
Is this issue fixed? I am running into issues with kwargs of the form:
some_task.apply_async(kwargs={'run_as_task': True, ...}) # or similar via send_task, delay etc
This gets stored as the string "{'run_as_task': True}"
. which cannot be json decoded (apart from the quotes issue) due to True
not being a valid JSON identifier.
difference between v1 and v2, make it much weirder...
in v1, i can use eval_literal directly, in v2, it become a json string of py's representation??
Is there a working fix by now? This issue is open for over 3 years now.
My current workarround is to use this custom function whenever I need get my task_kwargs:
import json
from django_celery_results.models import TaskResult
def get_taskresult_kwargs_as_dict(instance: TaskResult) -> dict:
"""
Convert the task_kwargs field of a TaskResult instance to a dictionary/json object
:param instance: TaskResult
:return: dict
"""
args = json.loads((instance.task_kwargs))
if isinstance(args, str):
args = args.replace("'", '"')
args = json.loads(args)
return args
args = get_taskresult_kwargs_as_dict(instance)
args['my_key']
Is there a working fix by now? This issue is open for over 3 years now.
My current workarround is to use this custom function whenever I need get my task_kwargs:
import json from django_celery_results.models import TaskResult def get_taskresult_kwargs_as_dict(instance: TaskResult) -> dict: """ Convert the task_kwargs field of a TaskResult instance to a dictionary/json object :param instance: TaskResult :return: dict """ args = json.loads((instance.task_kwargs)) if isinstance(args, str): args = args.replace("'", '"') args = json.loads(args) return args args = get_taskresult_kwargs_as_dict(instance) args['my_key']
Thank you for this. I was manually working through this when I found your code. I've modified it slightly to help with the problem @sdrabblescripta mentioned with 'True' values also breaking json.loads()
.
I replaced args = args.replace("'", '"')
with args = args.replace("'", '"').replace(" True,", ' "True",')
.