Explore general-purpose use cases
In offline discussion with @dgasmith, it sounds like a large portion of this code is fairly general and abstracts over several task-worker frameworks (e.g. Dask, Parsl, Fireworks, etc.).
Would it make sense to explore making portions of this package more general-purpose than quantum chemistry? I have use cases in mind like performing small analysis tasks on many trajectory files with freud. The granularity of these small tasks would be well-suited to this package's task execution interface, I think.
If this makes sense, perhaps the task layer could be branded as Fractal and the quantum chemistry applications built on Fractal could be branded as QCFractal.
I currently use the signac data management framework for my research, and it may make sense to utilize the task-worker components as an execution engine that signac-flow could use for processing its workflows.
I think this would be great to do as well. Can you help direct anyone else who has similar questions or thoughts to this thread so that we can properly assess the parts that we should move out, the requirements of this operation, and gauge the overall interest if we did pull out the central queue?
I would love to see this moved out into its own project. I am interested in developing tools that, ideally, would be scalable efficiently on arbitrary resources, from single machines to multi-cluster.
I am also interested in building a central repository for MD data that will eventually have a machine learning application. It would be immensely useful to me if the software I'm building were to automatically deposit relevant information on a central database as it was running, so that after (hopefully) many different users have applied it to their own use-cases, we would begin to build up a standardized library of diverse training data. Is QCFractal the appropriate tool with which to implement this sort of thing?
I'm tagging a few folks that would benefit from this:
- @SimonBoothroyd has built a distributed property estimation framework for assessing and optimizing molecular mechanics force fields, and could benefit from the distributed execution capabilities of the QCFractal engine across different sites with GPUs
- @jaimergp and @dominicrufa are working on distributed computing alchemical free energy calculations, and could potentially benefit from distributing those calculations across GPU compute resources at different sites
Excellent! @SimonBoothroyd @jaimergp can you both answer a few questions:
- For the control and generation of new tasks (e.g., python functions) what workflow tools do you use?
- How much data needs to be moved on and off compute clusters per task (Ranges like 1e3 - 5 kB are useful).
- Approximately how long is each task and on approximately how much resources?
- Do you require storage of all task input and output?