nextflow
nextflow copied to clipboard
Supporting Ignite and using Ceph S3 as working directory
Dear nextflow developers
As stated in issue (https://github.com/nextflow-io/nextflow/issues/2564) opened by @cnexcale, we are interested in using nextflow in combination with the nextflow feature that allows us to use OpenStack S3 (backed by Ceph) for the working directory. After trying out multiple executors, we figured out that a patched version of the ignite executor is the solution for our use case (using HEAD at commit 2188d51a0f4866ad4249cc5887080390f80cee85):
- prevent “missing plugin” exception when creating S3FileSystem during workdir initialization
- nf-amazon: prevent sending empty x-amz-tagging headers in putObject requests
- nf-ignite: debatable patch: load session config in Ignite BaseTask before deserializing Task payload
- nf-ignite: fixed missing log instance due to shadowing in derived class
- nf-ignite: added .command.env to unstaging scripts after process exec
Using ignite might also be a way for us to run Nextflow pipelines on a hybrid cloud setup. However, we noticed that with commit d4dc1cfe13fec4bd3db5736fc494fe4fedd5e60c the ignite executor seems to be deprecated and we wonder what are the reasons for not supporting it anymore. Is it a technical reason that makes it hard to continue the support or is it that the nextflow community is not using it? We might be interested in supporting it ourselves (at least we would try it).
We are aware that the fix of storing the Session in config in an IgBaseTask object is way too hacky for a “real” fix. It shouldn’t be a tasks concern to manage configs but in some way or another the AWS credentials and configuration must be present on/published to the worker deamons to correctly initialize an S3FileSytem in order to handle S3 interactions. Furthermore the current “fix” comes with the (severe) drawback of not not being able to use closures in the Session config. Having closures, i.e. for dynamic error strategies, led to problems with serialization of said config object. So this state is not desirable as long term solution and only serves the current proof of concept.
Hello, contributions are very welcome. I've commented on some of those changes.
Regarding the support for Apache Ignite, we realised that's very little usage and it's very complex to use and maintain. For this reason, it was decided to not include it anymore in the core packages and not support it "officially".
However, the full code is still available in this repository. If you or somebody else wants to step-in and take care of maintaining it, we would be happy to keep it the integration possible.
However, the full code is still available in this repository. If you or somebody else wants to step-in and take care of maintaining it, we would be happy to keep it the integration possible.
Yes, we would like to try to continue the ignite maintenance. Can you please explain what you mean by "keep the integration possible"? We noticed that in the nextflow documentation it now reads that "ignite is now no longer supported".
I mean, if somebody in the community steps in to maintain and support the ignite module, the core team can collaborate to guarantee it will work in future versions of nextflow.
Hello, after some more testing and extending the nf-ignite plugin functionality we're still inclined to use the Ignite plugin for object storage / s3 based working directories.
So we tested the integration of our plugin fork into the 22.04.5 stable release and could get it to work with these changes to the Nextflow core (basically a minimal rollback of this commit).
I guess something like that would qualify as "keep the integration possible"?
We would be very happy if it would be possible to use the Ignite plugin in current or upcoming releases! This would allow us to at least setup a custom build which integrates the plugin fork with some usecase-specific features, e.g. useMasterAsCompute flag.
And of course we are willing to contribute our nf-ignite features to the official repo if theres a demand.
Kind regards Lukas
This is possible, however the changes should be limited to the nextflow
launcher and this snippet. Does this make sense to you?
And of course we are willing to contribute our nf-ignite features to the official repo if theres a demand.
that's very welcome
Yes, the snippet you referenced was the minimum of changes in order to run nextflow node -bg
and start an Ignite daemon node. That'd be great, thanks!
Picking up on this issue because we had some problems during testing related to #3136.
As discussed in the PR, using export NXF_PLUGINS_DEFAULT=nf-ignite; nextflow node -bg
allows to start a Nextflow Ignite daemon node, even with the most recent Nextflow versions (edge).
Unfortunately, when starting a workflow with this setup an exception is thrown/logged by the Daemon node:
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 25
Serialization trace:
inputFiles (nextflow.processor.TaskBean)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at com.esotericsoftware.kryo.Kryo$readClassAndObject$2.call(Unknown Source)
at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy:182)
at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy)
at nextflow.executor.IgBaseTask.deserialize(IgBaseTask.groovy:112)
at nextflow.executor.IgBaseTask.call(IgBaseTask.groovy:136)
at nextflow.scheduler.SchedulerAgent$AgentProcessor.runTask0(SchedulerAgent.groovy:361)
at nextflow.scheduler.SchedulerAgent$AgentProcessor$1.run(SchedulerAgent.groovy:350)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
We could verify that this error does not occur when applying the changes proposed in the PR #3136.
Don't you as project maintainers see any way to enable the support for Ignite again, e.g. as proposed in said PR, at least only at source code level?
Or do you have any idea how to mitigate this issue?
You may want to try using the following variable:
NXF_PLUGINS_DEFAULT=nf-ignite,nf-amazon
It should behave exactly the as the linked change. If it still fails please include the full .nextflow.log file
Thanks for the quick reply and the hint, I can confirm that this works!
Verified with two daemon nodes in a cluster, one started with NXF_PLUGINS_DEFAULT=nf-ignite,nf-amazon
, the other with NXF_PLUGINS_DEFAULT=nf-ignite
.
The former processed tasks successfully, the latter again threw the exception and crashed the workflow.
Thanks again, we're hoping this fix will also work in future NF versions, that would be great!
Other changes, like the change of the method signature of the Session.onShutdown
hook can be resolved from within nf-ignite
plugin
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
My understanding is that the issue was solved.