nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Enhance the azcopy options configuration scope

Open abhi18av opened this issue 1 year ago • 4 comments

This PR builds upon the available settings with the core azcopy tool which is used for uploads and downloads in the context of Azure Blob storage service.

abhi18av avatar Aug 03 '22 07:08 abhi18av

@pditommaso this PR is ready from my side - but depends on a sister PR https://github.com/nextflow-io/azcopy-tool/pull/1 where I've mentioned the major benefit of the update since we source the azcopy tool from that repo.

UPDATE: Keeping this as draft so as not to merge accidentally before the sister PR.

abhi18av avatar Aug 04 '22 09:08 abhi18av

Quick update:

I've had to remove the options which are not compatible with the older 10.8.0 version of azcopy which we use and now the behavior is as expected on the Azure side

The following config with custom azcopy opts are reflected on the generated .command.run files ✅

      azcopy {
          blockSize        = "17"
          blobTier         = "Cool"
          putMD5           = true
          overwrite        = false
          checkMD5         = "NoCheck"
          copyToolInstallMode= 'task'
      }


The only problem remains to be solved is the last test in AzBashLibTest , for which @jorgeaguileraseqera has graciously offered to help (thanks!)

https://github.com/nextflow-io/nextflow/blob/db89af1e4a5f2021972a2bdf2e0bb2aed997245e/plugins/nf-azure/src/test/nextflow/cloud/azure/file/AzBashLibTest.groovy#L163

abhi18av avatar Aug 04 '22 12:08 abhi18av

The unit tests for the code and the functional test (via rnaseq-nf) are passing now

(base) bash-5.0$ ../../nextflow/launch.sh run main.nf -profile azb 
N E X T F L O W  ~  version 22.08.0-edge
Launching `main.nf` [fervent_descartes] DSL2 - revision: 4ba66eb0c8
 R N A S E Q - N F   P I P E L I N E
 ===================================
 transcriptome: /Users/eklavya/projects/code/nextflow_on_azure/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa
 reads        : /Users/eklavya/projects/code/nextflow_on_azure/rnaseq-nf/data/ggal/ggal_gut_{1,2}.fq
 outdir       : az://batch-jobs/rnaseq-nf-publishdir
 
Uploading local `bin` scripts folder to az://batch-jobs/rnaseq-nf-workdir/tmp/7e/03ec1928a1e8327db898dd00687297/bin
[e3/17ed0c] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔
[7e/54d1f2] process > RNASEQ:FASTQC (FASTQC on ggal_gut)      [100%] 1 of 1 ✔
[5d/3cbff4] process > RNASEQ:QUANT (ggal_gut)                 [100%] 1 of 1 ✔
[04/c40c43] process > MULTIQC                                 [100%] 1 of 1 ✔
Staging foreign file: /Users/eklavya/projects/code/nextflow_on_azure/rnaseq-nf/multiqc

Done! Open the following report in your browser --> az://batch-jobs/rnaseq-nf-publishdir/multiqc_report.html

Completed at: 05-Aug-2022 11:26:07
Duration    : 1m 37s
CPU hours   : (a few seconds)
Succeeded   : 4


abhi18av avatar Aug 05 '22 09:08 abhi18av

@pditommaso , thanks for the review. I have accommodated the change requests now.

abhi18av avatar Aug 09 '22 07:08 abhi18av

@pditommaso , I've made the MD5 (check/put) behavior as the default and resolved all merge conflicts.

Shall we merge ahead?

The next round of changes would be done only after the azcopy tool has been upgraded via https://github.com/nextflow-io/azcopy-tool/pull/1 (WIP)

abhi18av avatar Sep 26 '22 13:09 abhi18av

:warning: 7 God Classes were detected by Lift in this project. Visit the Lift web console for more details.

sonatype-lift[bot] avatar Oct 01 '22 18:10 sonatype-lift[bot]

@pditommaso , shall we take this merge forward?

abhi18av avatar Oct 24 '22 10:10 abhi18av

@pditommaso , shall we merge the PR with this edge release cycle?

abhi18av avatar Nov 16 '22 12:11 abhi18av

Abi, there's the usual problem your PRs do not run the tests and I've tested locally and it fails

pditommaso avatar Nov 18 '22 10:11 pditommaso

Abi, there's the usual problem your PRs do not run the tests and I've tested locally and it fails

🤔 this is curious. Let me take a look why there's no automated testing here.

abhi18av avatar Nov 29 '22 11:11 abhi18av

We lost the momentum on this. Closing it because I don't know the status of this PR

pditommaso avatar Jul 05 '23 09:07 pditommaso

True 😞

But quick question, do you think that this effort would be a useful addition since fusion is now available for Azure as well?

abhi18av avatar Jul 06 '23 02:07 abhi18av

That's another good point. The ideal goal is to not rely anymore on azcopy and use instead fusion in the mid-term

pditommaso avatar Jul 06 '23 07:07 pditommaso

I see, in that case let me know if there's a need for the retouches on this PR and then I can make the iteration and circle back. Otherwise, I agree, good opportunity to showcase fusionfs for this 👍

abhi18av avatar Jul 06 '23 08:07 abhi18av