scrnaseq icon indicating copy to clipboard operation
scrnaseq copied to clipboard

Switch from alevin to alevin-fry (with simpleaf)

Open fmalmeida opened this issue 1 year ago • 5 comments

PR checklist

  • [ ] This comment contains a description of changes (with reason).
  • [ ] If you've fixed a bug or added code that should be tested, add tests!
    • [ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • [ ] If necessary, also make a PR on the nf-core/scrnaseq branch on the nf-core/test-datasets repository.
  • [ ] Make sure your code lints (nf-core lint).
  • [ ] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • [ ] Usage Documentation in docs/usage.md is updated.
  • [ ] Output Documentation in docs/output.md is updated.
  • [ ] CHANGELOG.md is updated.
  • [ ] README.md is updated (including new tool citations and authors/contributors).

Work on issue #93

fmalmeida avatar Aug 18 '22 10:08 fmalmeida

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 298233d

+| ✅ 157 tests passed       |+
!| ❗   1 tests had warnings |!

:heavy_exclamation_mark: Test warnings:

  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline

:white_check_mark: Tests passed:

Run details

  • nf-core/tools version 2.6
  • Run at 2022-10-05 12:44:32

github-actions[bot] avatar Aug 18 '22 10:08 github-actions[bot]

Hi @grst and @rob-p,

Now that we have SIMPLEAF_INDEX and SIMPLEAF_QUANT modules working, before going on ahead on implementation, I think is good to have a nice review and brainstorm on how the workflow and parameters should look like.

For example, now that SIMPLEAF_INDEX also builds the txp2gene maps, how we will handle those? How can we reestructure the workflow thinking to check whether we have it from user, from simpleaf or gffread?

https://github.com/nf-core/scrnaseq/blob/07136b05d5b5e9a60ef9568d495ed18dec8ef31d/subworkflows/local/alevin.nf#L48-L56

Also on the module itself, how's best to use chemistry/protocol? How it was done before with the values coming from the function in lib/?

All that, so would be good to have a careful review on that.

😄 🚀

fmalmeida avatar Aug 23 '22 14:08 fmalmeida

Hi @fmalmeida,

Regarding the txp2gene, I think it’s important to use the one that comes from simpleaf especially when the user is making use of a splici reference (the recommended default for using alevin-fry). This is because the txp2gene will look different from the standard “2 column” map, in that each transcript is assigned a gene of origin as well as an annotated splicing status (spliced or unspliced). When the index is made in —ref mode (i.e. when the user is not actually invoking pyroe to build a splici transcriptome, but instead just indexing a given transcriptome), then we might expect a user-provided txp2gene file to be given. I expect that the former mode of operation will be much more common, but presumably the latter should be supported?

Regarding the chemistry — alevin-fry (and therefore simpleaf) supports quite a few as well as a fairly general “generic geometry” that lets it process pretty much any protocol with known barcode and UMI lengths and offsets. How are the chemistry parameters currently handled in the pipeline among the different tools?

rob-p avatar Aug 24 '22 18:08 rob-p

Hi @rob-p,

About txp2gene

What you said makes total sense. I will adjust to be in that way, to use the one from simpleaf but to expect a given one when using given transcripts (--ref).

About chemistry

Great to hear that is very agnostic. But we may require a little more work on better defining it on the pipeline then. For now, this is how chemistries are handled in the pipeline:

https://github.com/nf-core/scrnaseq/blob/07136b05d5b5e9a60ef9568d495ed18dec8ef31d/lib/WorkflowScrnaseq.groovy#L71-L155

fmalmeida avatar Aug 25 '22 10:08 fmalmeida

Hi @rob-p ,

Just updated how txp2gene is handled base on your propositions. Please see whether you agree:

  1. when indexing a built transcriptome with --transcript_fasta that is passed as --refseq to simpleaf, we expect a built txp2gene file:

https://github.com/nf-core/scrnaseq/blob/4d3c5c520321a44775d8e30ac82e2db483284f4a/subworkflows/local/alevin.nf#L34-L37

  1. Then, we run the pipeline normally. After simpleaf_index we then load the correct txp2gene. Choosing between the given one or one that comes from simpleaf:

https://github.com/nf-core/scrnaseq/blob/4d3c5c520321a44775d8e30ac82e2db483284f4a/subworkflows/local/alevin.nf#L50-L53

fmalmeida avatar Aug 29 '22 09:08 fmalmeida

Hm, AlevinQC fails:

  Error in checkAlevinInputFiles(baseDir) : 
    Input directory not compatible with Salmon v0.14 or newer (without external whitelist), the following required file(s) are missing or malformed:
  /home/runner/work/scrnaseq/scrnaseq/work/dc/3d69e08ce60e590be85661861d8830/Sample_X_alevin_results/alevin/raw_cb_frequency.txt
  /home/runner/work/scrnaseq/scrnaseq/work/dc/3d69e08ce60e590be85661861d8830/Sample_X_alevin_results/alevin/featureDump.txt
  /home/runner/work/scrnaseq/scrnaseq/work/dc/3d69e08ce60e590be85661861d8830/Sample_X_alevin_results/aux_info/meta_info.json
  /home/runner/work/scrnaseq/scrnaseq/work/dc/3d69e08ce60e590be85661861d8830/Sample_X_alevin_results/aux_info/alevin_meta_info.json
  /home/runner/work/scrnaseq/scrnaseq/work/dc/3d69e08ce60e590be85661861d8830/Sample_X_alevin_results/cmd_info.json
  /home/runner/work/scrnaseq/scrnaseq/work/dc/3d69e08ce60e590be85661861d8830/Sample_X_alevin_results/alevin/whitelist.txt

apeltzer avatar Sep 13 '22 08:09 apeltzer

Yes, I think this is where I stopped last time. It seems that the directory from simpleaf is not what alevinqc expects, or I did something wrong either in the command or when selecting what to pass to alevinqc.

But, I didn’t check it further.

fmalmeida avatar Sep 13 '22 08:09 fmalmeida

Is the latest alevinqc being used?

rob-p avatar Sep 13 '22 08:09 rob-p

Yes, I do think so. The modules are sticked to alevinqc version 1.10

fmalmeida avatar Sep 15 '22 19:09 fmalmeida

Let me ping @csoneson here. There was recently an update to AlevinQC to allow it to work in the case when alevin-fry was used in unfiltered mode (the mode being used by default here). Previously, it only worked when knee filtering was used (as some of the graphs didn't make sense in the other case).

rob-p avatar Sep 15 '22 20:09 rob-p

The current release version of alevinQC is 1.12.1, the devel version is 1.13.2. The support for processing alevin-fry output was added in v1.11.1, so v1.10 is not going to work, unfortunately. Also note that in order to read alevin-fry output, you'll need to use the "fry" functions (e.g., alevinFryQCReport() rather than alevinQCReport(), see https://csoneson.github.io/alevinQC/reference/index.html).

csoneson avatar Sep 15 '22 21:09 csoneson

So this is the problem. Because we are indeed using the latest version available in bioconda, but it turns out that it is not up-to-date with the package.

fmalmeida avatar Sep 16 '22 06:09 fmalmeida

Seems to be it - we should check on Bioconda whether we can update the recipe easily, then just do that and resolve the issue hopefully :-)

apeltzer avatar Sep 16 '22 07:09 apeltzer

I made a PR for getting 1.12.x on bioconda. https://github.com/bioconda/bioconda-recipes/pull/37106

apeltzer avatar Sep 23 '22 14:09 apeltzer

PR for AlevinQC just got merged to bioconda thanks to @rob-p - we should be able to add a new AlevinQC version then, so this should be fixed then making all tests here pass 👍🏻

apeltzer avatar Sep 23 '22 16:09 apeltzer

Python linting (black) is failing

To keep the code consistent with lots of contributors, we run automated code consistency checks. To fix this CI test, please run:

  • Install black: pip install black
  • Fix formatting errors in your pipeline: black .

Once you push these changes the test should pass, and you can hide this comment :+1:

We highly recommend setting up Black in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

Thanks again for your contribution!

github-actions[bot] avatar Sep 26 '22 09:09 github-actions[bot]

Hi folks,

I now need some help to properly reconfigure the workflow (or modules) so that the permitDir is generated by simpleaf and that the whitelist is properly used. This is what's missing for alevinqc to works now if you check the current error message.

 Error in checkAlevinFryInputFiles(mapDir = mapDir, permitDir = permitDir,  : 
    Input directory not compatible with alevin-fry v0.5.0 or newer, the following required file(s) are missing or malformed:
  /home/runner/work/scrnaseq/scrnaseq/work/4b/e29e1c493cc1c9184c2618b3992674/Sample_X_alevin_results/all_freq.bin
  /home/runner/work/scrnaseq/scrnaseq/work/4b/e29e1c493cc1c9184c2618b3992674/Sample_X_alevin_results/permit_freq.bin

fmalmeida avatar Sep 26 '22 13:09 fmalmeida

Hello, I am testing the alevin-fry branch and came across this error message at the indexing step: Error: pyroe failed to return succesfully ExitStatus(unix_wait_status(256)) This might help you determine what is going wrong.

Khajidu avatar Sep 29 '22 09:09 Khajidu

Oh I found an inconsistency in the /modules/locale/simpleaf_quant.nf file (at the very beginning, lines 5-8):

conda (params.enable_conda ? 'bioconda::simpleaf=0.4.0' : null)
   container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
       'https://depot.galaxyproject.org/singularity/simpleaf:0.5.1--h9f5acd7_0' :
       'quay.io/biocontainers/simpleaf:0.5.1--h9f5acd7_0' }"

The version should be the same everywhere, right?

Khajidu avatar Sep 29 '22 09:09 Khajidu

Hi @Khajidu ,

yes, they should be the same.

fmalmeida avatar Sep 29 '22 10:09 fmalmeida

Oh, same inconsistency in simpleaf_index.nf...

Khajidu avatar Sep 29 '22 10:09 Khajidu

Now, to finalise this, we should address this:

https://github.com/nf-core/scrnaseq/pull/139#issuecomment-1258005319

Anyone has an idea?

fmalmeida avatar Sep 29 '22 11:09 fmalmeida

Tested the pipeline again just in case, and still have the same error:

Hello, I am testing the alevin-fry branch and came across this error message at the indexing step: Error: pyroe failed to return succesfully ExitStatus(unix_wait_status(256)) This might help you determine what is going wrong.

Khajidu avatar Sep 29 '22 11:09 Khajidu

I have no idea on such error.

How are you calling the pipeline? Using docker, conda or singularity profile? Using the test dataset or your own?

could you try, e.g. using the test and docker profiles?

fmalmeida avatar Sep 29 '22 11:09 fmalmeida

I use the singularity profile (same as while testing with kallisto) on my own data, testing test profile now.

Khajidu avatar Sep 29 '22 11:09 Khajidu

Hi @rob-p ,

Pinging you to check if you have any idea on what this error experienced by @Khajidu may be related to?

fmalmeida avatar Sep 29 '22 11:09 fmalmeida

Test profile worked at indexing step, but failed to submit one of the two counting jobs.

Khajidu avatar Sep 29 '22 11:09 Khajidu

And now that I removed cached files, the sample sheet check also failed on my sample sheet (but not on the test sample sheet).

Edit: it worked now O_o

Khajidu avatar Sep 29 '22 12:09 Khajidu

Very funny that i did not work. I just tried again and it worked until the error commented here https://github.com/nf-core/scrnaseq/pull/139#issuecomment-1258005319

gitpod /workspace/scrnaseq/test (alevin-fry) $ nextflow run ../main.nf -profile test,docker --outdir results_simpleaf --aligner alevin
Picked up JAVA_TOOL_OPTIONS:  -Xmx3489m
Picked up JAVA_TOOL_OPTIONS:  -Xmx3489m
N E X T F L O W  ~  version 22.04.0
Launching `../main.nf` [jolly_gilbert] DSL2 - revision: 7b2f2dd260

WARN: Found unexpected parameters:
* --simpleaf_rlen: 91
- Ignore this warning: params.schema_ignore_params = "simpleaf_rlen" 



------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/scrnaseq v2.0.1dev
------------------------------------------------------
Core Nextflow options
  runName                   : jolly_gilbert
  containerEngine           : docker
  launchDir                 : /workspace/scrnaseq/test
  workDir                   : /workspace/scrnaseq/test/work
  projectDir                : /workspace/scrnaseq
  userName                  : gitpod
  profile                   : test,docker
  configFiles               : /workspace/scrnaseq/nextflow.config

Input/output options
  input                     : https://github.com/nf-core/test-datasets/raw/scrnaseq/samplesheet-2-0.csv
  outdir                    : results_simpleaf

Reference genome options
  genome_fasta              : https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa
  gtf                       : https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/gencode.vM19.annotation.chr19.gtf

Kallisto/BUS Options
  bustools_correct          : true

Institutional config options
  config_profile_name       : Test profile
  config_profile_description: Minimal test dataset to check pipeline function

Max job request options
  max_cpus                  : 2
  max_memory                : 6.GB
  max_time                  : 6.h

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/scrnaseq for your analysis please cite:

* The nf-core framework
  https://doi.org/10.5281/zenodo.4643461

* Software dependencies
  https://github.com/nf-core/scrnaseq/blob/master/CITATIONS.md
------------------------------------------------------
executor >  local (12)
[b1/4cd473] process > NFCORE_SCRNASEQ:SCRNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet-2-0.csv)                [100%] 1 of 1 ✔
[07/db5078] process > NFCORE_SCRNASEQ:SCRNASEQ:FASTQC_CHECK:FASTQC (Sample_X)                                     [100%] 2 of 2 ✔
[11/21b9df] process > NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:SIMPLEAF_INDEX (gencode.vM19.annotation.chr19.gtf) [100%] 1 of 1 ✔
[fc/69b37d] process > NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:SIMPLEAF_QUANT (Sample_X)                          [100%] 2 of 2 ✔
[72/760952] process > NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:ALEVINQC (Sample_X)                                [100%] 2 of 2, failed: 2 ✘
[3f/d69355] process > NFCORE_SCRNASEQ:SCRNASEQ:MTX_CONVERSION:MTX_TO_H5AD (Sample_X)                              [100%] 2 of 2 ✔
[-        ] process > NFCORE_SCRNASEQ:SCRNASEQ:MTX_CONVERSION:CONCAT_H5AD                                         -
[-        ] process > NFCORE_SCRNASEQ:SCRNASEQ:MTX_CONVERSION:MTX_TO_SEURAT (Sample_X)                            -
[-        ] process > NFCORE_SCRNASEQ:SCRNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS                                        -
[-        ] process > NFCORE_SCRNASEQ:SCRNASEQ:MULTIQC                                                            -

Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:ALEVINQC (Sample_Y)'

Caused by:
  Process `NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:ALEVINQC (Sample_Y)` terminated with an error exit status (1)

Command executed:

  #!/usr/bin/env Rscript
  require(alevinQC)
  alevinFryQCReport(
      mapDir = "Sample_Y_alevin_results/af_map",
      quantDir = "Sample_Y_alevin_results/af_quant",
      permitDir= "Sample_Y_alevin_results",
      sampleId = "Sample_Y",
      outputFile = "alevin_report_Sample_Y.html",
      outputFormat = "html_document",
      outputDir = "./",
      forceOverwrite = TRUE
  )
  
  yaml::write_yaml(
      list(
          'NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:ALEVINQC'=list(
              'alevinqc' = paste(packageVersion('alevinQC'), collapse='.')
          )
      ),
      "versions.yml"
  )

Command exit status:
  1

Command output:
  (empty)

Command error:
  Loading required package: alevinQC
  Registered S3 method overwritten by 'GGally':
    method from   
    +.gg   ggplot2
  Error in checkAlevinFryInputFiles(mapDir = mapDir, permitDir = permitDir,  : 
    Input directory not compatible with alevin-fry v0.5.0 or newer, the following required file(s) are missing or malformed:
  /workspace/scrnaseq/test/work/93/2e0426d631c6121ad6de37bc27497d/Sample_Y_alevin_results/all_freq.bin
  /workspace/scrnaseq/test/work/93/2e0426d631c6121ad6de37bc27497d/Sample_Y_alevin_results/permit_freq.bin
  
  Input directory not compatible with alevin-fry v0.4.3 or newer, the following required file(s) are missing or malformed:
  /workspace/scrnaseq/test/work/93/2e0426d631c6121ad6de37bc27497d/Sample_Y_alevin_results/all_freq.tsv
  
  Calls: alevinFryQCReport -> .alevinQCReport -> checkAlevinFryInputFiles
  Execution halted

Work dir:
  /workspace/scrnaseq/test/work/21/91c20894dbd6fdc07d759b0e88338c

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line


One more CTRL+C to force exit
Adieu

fmalmeida avatar Sep 29 '22 12:09 fmalmeida

Gave it 500G and same error as before, same step.

Khajidu avatar Sep 29 '22 13:09 Khajidu