modin
                                
                                
                                
                                    modin copied to clipboard
                            
                            
                            
                        Modin not working with Django celery
I tried Ray and Dask and both failed to work from celery task.
Is there any way to use Modin inside Django celery task ?
For Dask the error was:
AssertionError: daemonic processes are not allowed to have children
[2024-03-12 09:13:07,640: INFO/ForkPoolWorker-7] Closing Nanny at 'tcp://127.0.0.1:38131'. Reason: nanny-instantiate-failed
                                    
                                    
                                    
                                
Hi @Abhi5h3k, thanks for opening this question! I am afraid it is not possible to use Modin with an engine like Ray, Dask or MPI inside a celery task. Why you are seeing the error is the following. When running a celery app, you create some celery worker processes depending usually on CPUs available, which will execute your tasks. Inside a celery task you use Modin, which spawns its own worker processes depending on the engine chosen to distribute operations on data. Sometimes, it is not possible. Read a more detailed answer on this matter on this post. You could try to initiatilize a Modin engine beforehand and use Modin objects inside your celery tasks.
import ray
ray.init(...)
import modin.pandas as pd
df = pd.DataFrame(...)
...
However, as I said, both celery and Modin will spawn their own worker processes, which may lead to oversubscription. Also, I am not sure if that will work at all. Modin transparently distributes the data and computation so that you can continue using the same pandas API while being able to work with more data faster. You just need to replace the import import pandas as pd to import modin.pandas as pd and Modin will take care of execution. So no need to use any other distributed tool (like celery in your case) to process operations in parallel alongside with Modin.