django-elasticsearch-dsl
django-elasticsearch-dsl copied to clipboard
Implement the Celery signal processor
The CelerySignalProcessor
allows automatic updates on the index as delayed background tasks using Celery.
NB: We cannot process deletes as background tasks. By the time the Celery worker would pick up the delete job, the model instance would already deleted. We can get around this by setting Celery to use pickle
and sending the object to the worker, but using pickle
opens the application up to security concerns.
#34
This is still work in progress. Need help writing the tests around the new signal processor.
@CorrosiveKid do you still need help with tests?
@andreyrusanov Yes please, if possible. I still haven't had the time to look at it after my initial commit.
@CorrosiveKid could you give me write permissions to you repository? So I will update the PR
@andreyrusanov Done!
@ur001 @CorrosiveKid @sabricot
we have an issue with tests here. Signals could not be properly attached to models, used for tests. They connected, but don't triggered. What I've tried:
- I used override_settings decorator (with it old signal were triggered anyway, unless I removed it explicitly)
- tried to run DEDConfig().ready() again (with signal_processor field set to None)
- tried to explicitly disable old signals and connect new one. I also tried to do lots of minor changes/things, but nothing works.
New signals were connected without issue, but when afterwards something happened with model Signal dispatcher doesn't find any receiver for it.. If anybody can take a look or have an idea how to fix it - please do!
I commited workaround I've made. It is really hacky in some point and I have really big doubts it will be approved. If so - I or @CorrosiveKid can just revert it back. If not - I will resolve conflicts/add extra docs.
any update on this branch?
@CorrosiveKid I see there is one issue with the current implementation. It calls a task for every save, while it could possibly call a task for multiple objects in one go using the elasticsearch_dls.helpers.bulk method. It is not simple though to implement this behaviour as it will need to then have a queue which stores objects of same types in different queues and then passed at certain interval to bulk update.
@CorrosiveKid I suggest to use transaction.on_commit to prevent race condition.
transaction.on_commit(lambda: self.registry_update_task.delay(pk, app_label, model_name)) transaction.on_commit(lambda: self.registry_update_related_task.delay(pk, app_label, model_name))
Is this branch still in development? Do you need any help?
@vonmaster Any help is greatly appreciated, I haven't had the time to look at it yet.
@CorrosiveKid any update on this branch?
@CorrosiveKid any update on this branch?
It's quite stale at this point since I haven't had the time to pursue this :(
@CorrosiveKid if you do ever end up finishing this you could handle deletes by getting the model class and creating a new instance of the model, but not saving it like get_model(app_label, model_name)(pk=pk)
then padding that to the registry.delete
Quick check @safwanrahman: does this still work on the latest package version? Are there any breaking changes? Not sure how this PR is being tested, but I think it'd be invaluable to re-run a Travis test by force pushing master and resolving conflicts.
@Andrew-Chen-Wang This PR is actually out of date. Need to rebase against master and port it according to new code!
Would love to see this picked up and finished!
Would love to see this picked up and finished!
Please take a look. https://github.com/django-es/django-elasticsearch-dsl/pull/414
I rebased it against master and implement the celery delete and delete_related tasks.
#414 is better position currently.