nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Supporting Ignite and using Ceph S3 as working directory

Open pbelmann opened this issue 2 years ago • 6 comments

Dear nextflow developers

As stated in issue (https://github.com/nextflow-io/nextflow/issues/2564) opened by @cnexcale, we are interested in using nextflow in combination with the nextflow feature that allows us to use OpenStack S3 (backed by Ceph) for the working directory. After trying out multiple executors, we figured out that a patched version of the ignite executor is the solution for our use case (using HEAD at commit 2188d51a0f4866ad4249cc5887080390f80cee85):

Using ignite might also be a way for us to run Nextflow pipelines on a hybrid cloud setup. However, we noticed that with commit d4dc1cfe13fec4bd3db5736fc494fe4fedd5e60c the ignite executor seems to be deprecated and we wonder what are the reasons for not supporting it anymore. Is it a technical reason that makes it hard to continue the support or is it that the nextflow community is not using it? We might be interested in supporting it ourselves (at least we would try it).

We are aware that the fix of storing the Session in config in an IgBaseTask object is way too hacky for a “real” fix. It shouldn’t be a tasks concern to manage configs but in some way or another the AWS credentials and configuration must be present on/published to the worker deamons to correctly initialize an S3FileSytem in order to handle S3 interactions. Furthermore the current “fix” comes with the (severe) drawback of not not being able to use closures in the Session config. Having closures, i.e. for dynamic error strategies, led to problems with serialization of said config object. So this state is not desirable as long term solution and only serves the current proof of concept.

pbelmann avatar Feb 22 '22 19:02 pbelmann

Hello, contributions are very welcome. I've commented on some of those changes.

Regarding the support for Apache Ignite, we realised that's very little usage and it's very complex to use and maintain. For this reason, it was decided to not include it anymore in the core packages and not support it "officially".

However, the full code is still available in this repository. If you or somebody else wants to step-in and take care of maintaining it, we would be happy to keep it the integration possible.

pditommaso avatar Feb 23 '22 15:02 pditommaso

However, the full code is still available in this repository. If you or somebody else wants to step-in and take care of maintaining it, we would be happy to keep it the integration possible.

Yes, we would like to try to continue the ignite maintenance. Can you please explain what you mean by "keep the integration possible"? We noticed that in the nextflow documentation it now reads that "ignite is now no longer supported".

pbelmann avatar Jul 18 '22 12:07 pbelmann

I mean, if somebody in the community steps in to maintain and support the ignite module, the core team can collaborate to guarantee it will work in future versions of nextflow.

pditommaso avatar Jul 22 '22 10:07 pditommaso

Hello, after some more testing and extending the nf-ignite plugin functionality we're still inclined to use the Ignite plugin for object storage / s3 based working directories.

So we tested the integration of our plugin fork into the 22.04.5 stable release and could get it to work with these changes to the Nextflow core (basically a minimal rollback of this commit).

I guess something like that would qualify as "keep the integration possible"?

We would be very happy if it would be possible to use the Ignite plugin in current or upcoming releases! This would allow us to at least setup a custom build which integrates the plugin fork with some usecase-specific features, e.g. useMasterAsCompute flag.

And of course we are willing to contribute our nf-ignite features to the official repo if theres a demand.

Kind regards Lukas

cnexcale avatar Aug 08 '22 06:08 cnexcale

This is possible, however the changes should be limited to the nextflow launcher and this snippet. Does this make sense to you?

And of course we are willing to contribute our nf-ignite features to the official repo if theres a demand.

that's very welcome

pditommaso avatar Aug 09 '22 13:08 pditommaso

Yes, the snippet you referenced was the minimum of changes in order to run nextflow node -bg and start an Ignite daemon node. That'd be great, thanks!

cnexcale avatar Aug 10 '22 13:08 cnexcale

Picking up on this issue because we had some problems during testing related to #3136.

As discussed in the PR, using export NXF_PLUGINS_DEFAULT=nf-ignite; nextflow node -bg allows to start a Nextflow Ignite daemon node, even with the most recent Nextflow versions (edge).

Unfortunately, when starting a workflow with this setup an exception is thrown/logged by the Daemon node:

com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 25
Serialization trace:
inputFiles (nextflow.processor.TaskBean)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)
        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
        at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
        at com.esotericsoftware.kryo.Kryo$readClassAndObject$2.call(Unknown Source)
        at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy:182)
        at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy)
        at nextflow.executor.IgBaseTask.deserialize(IgBaseTask.groovy:112)
        at nextflow.executor.IgBaseTask.call(IgBaseTask.groovy:136)
        at nextflow.scheduler.SchedulerAgent$AgentProcessor.runTask0(SchedulerAgent.groovy:361)
        at nextflow.scheduler.SchedulerAgent$AgentProcessor$1.run(SchedulerAgent.groovy:350)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

We could verify that this error does not occur when applying the changes proposed in the PR #3136.

Don't you as project maintainers see any way to enable the support for Ignite again, e.g. as proposed in said PR, at least only at source code level?

Or do you have any idea how to mitigate this issue?

cnexcale avatar Oct 03 '22 20:10 cnexcale

You may want to try using the following variable:

NXF_PLUGINS_DEFAULT=nf-ignite,nf-amazon 

It should behave exactly the as the linked change. If it still fails please include the full .nextflow.log file

pditommaso avatar Oct 04 '22 18:10 pditommaso

Thanks for the quick reply and the hint, I can confirm that this works!

Verified with two daemon nodes in a cluster, one started with NXF_PLUGINS_DEFAULT=nf-ignite,nf-amazon , the other with NXF_PLUGINS_DEFAULT=nf-ignite. The former processed tasks successfully, the latter again threw the exception and crashed the workflow.

Thanks again, we're hoping this fix will also work in future NF versions, that would be great! Other changes, like the change of the method signature of the Session.onShutdown hook can be resolved from within nf-ignite plugin

cnexcale avatar Oct 05 '22 10:10 cnexcale

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 18 '23 09:03 stale[bot]

My understanding is that the issue was solved.

pditommaso avatar Mar 18 '23 11:03 pditommaso