crisprseq Implementation of nf-gpt into crisprseq pipeline.

This is a very early stage implementation of the nf-gpt plugin into the pipeline. It can currently handles a list of genes extracted from the drugZ, bagel2 or mle module output and parses it to the plugin.

You can run the changes using: export OPENAI_API_KEY=your-api-key nextflow run . -profile test_screening,docker --outdir test -dump-channels --gpt_interpretation drugz,mle,bagel2,rra -resume

DISCLAIMER! You will need a functioning Open AI api key. for this to work.

Since this is very early version and only comes with the bare minimum of functionality there are a lot of adjustment that have to me made (in order of priority):

integrate results into MULTIQC report!
update local module with nf-core template
add check for gene amount

Keep in mind this implementation does not aim to achieve 100% correct results from the gpt model, but instead is to build a foundational implementation for (future) LLMs into a nf-core pipeline.

If you have any suggestions or requested changes please let me know and i will try to adjust the code.

Sep 02 '24 11:09 LeonHornich

[!WARNING] Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.0.2. Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Sep 02 '24 11:09 github-actions[bot]

`nf-core lint` overall result: Failed :x:

Posted for pipeline commit 05c4d56

+| ✅ 235 tests passed       |+
#| ❔   3 tests were ignored |#
!| ❗   5 tests had warnings |!
-| ❌   1 tests failed       |-

:x: Test failures:

actions_ci - Minimum pipeline NF version '23.04.0' is not tested in '.github/workflows/ci.yml'

:heavy_exclamation_mark: Test warnings:

nextflow_config - Config manifest.version should end in dev: 2.3.0
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

:grey_question: Tests ignored:

files_exist - File is ignored: conf/test.config
files_exist - File is ignored: conf/test_full.config
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md

:white_check_mark: Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-crisprseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: docs/images/nf-core-crisprseq_logo_light.png
files_exist - File found: docs/images/nf-core-crisprseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-crisprseq_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowCrisprseq.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.umi_bin_size= 1
nextflow_config - Config default value correct: params.medaka_model= r941_min_high_g303
nextflow_config - Config default value correct: params.aligner= minimap2
nextflow_config - Config default value correct: params.vsearch_minseqlength= 55
nextflow_config - Config default value correct: params.vsearch_maxseqlength= 57
nextflow_config - Config default value correct: params.vsearch_id= 0.99
nextflow_config - Config default value correct: params.min_reads= 30.0
nextflow_config - Config default value correct: params.min_targeted_genes= 3.0
nextflow_config - Config default value correct: params.bagel_reference_essentials= https://raw.githubusercontent.com/hart-lab/bagel/master/CEGv2.txt
nextflow_config - Config default value correct: params.bagel_reference_nonessentials= https://raw.githubusercontent.com/hart-lab/bagel/master/NEGv1.txt
nextflow_config - Config default value correct: params.hit_selection_iteration_nb= 1000.0
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.validationSchemaIgnoreParams= genomes,igenomes_base
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
nextflow_config - Config default value correct: params.gpt_drugz_gene_amount= 100
nextflow_config - Config default value correct: params.gpt_drugz_question= Which of the following genes enhance or supress drug activity?
nextflow_config - Config default value correct: params.gpt_mle_gene_amount= 100
nextflow_config - Config default value correct: params.gpt_mle_question= What genes are known to have pan-effects on cancer?
nextflow_config - Config default value correct: params.gpt_bagel2_gene_amount= 100
nextflow_config - Config default value correct: params.gpt_bagel_question= What can you tell me about these genes in the context of functional genomics?
nextflow_config - Config default value correct: params.gpt_rra_gene_amount= 100
nextflow_config - Config default value correct: params.gpt_rra_question= What genes are known to have pan-effects on cancer?
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-crisprseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-crisprseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-crisprseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (289 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest_screening.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - ORIENT_REFERENCE found in conf/modules.config and Nextflow scripts.
modules_config - CAT_FASTQ found in conf/modules.config and Nextflow scripts.
modules_config - PEAR found in conf/modules.config and Nextflow scripts.
modules_config - BAGEL2_BF found in conf/modules.config and Nextflow scripts.
modules_config - BAGEL2_PR found in conf/modules.config and Nextflow scripts.
modules_config - BAGEL2_FC found in conf/modules.config and Nextflow scripts.
modules_config - DRUGZ found in conf/modules.config and Nextflow scripts.
modules_config - BAGEL2_GRAPH found in conf/modules.config and Nextflow scripts.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - FIND_ADAPTERS found in conf/modules.config and Nextflow scripts.
modules_config - MAGECK_COUNT found in conf/modules.config and Nextflow scripts.
modules_config - MAGECK_MLE found in conf/modules.config and Nextflow scripts.
modules_config - MAGECK_TEST found in conf/modules.config and Nextflow scripts.
modules_config - MAGECK_GRAPHRRA found in conf/modules.config and Nextflow scripts.
modules_config - MAGECK_FLUTEMLE found in conf/modules.config and Nextflow scripts.
modules_config - HITSELECTION found in conf/modules.config and Nextflow scripts.
modules_config - HITSELECTION_MLE found in conf/modules.config and Nextflow scripts.
modules_config - HITSELECTION_BAGEL2 found in conf/modules.config and Nextflow scripts.
modules_config - HITSELECTION_RRA found in conf/modules.config and Nextflow scripts.
modules_config - VENNDIAGRAM found in conf/modules.config and Nextflow scripts.
modules_config - MAGECK_MLE_DAY0 found in conf/modules.config and Nextflow scripts.
modules_config - CRISPRCLEANR_NORMALIZE found in conf/modules.config and Nextflow scripts.
modules_config - CUTADAPT found in conf/modules.config and Nextflow scripts.
modules_config - CUTADAPT_FIVE_PRIME found in conf/modules.config and Nextflow scripts.
modules_config - CUTADAPT_THREE_PRIME found in conf/modules.config and Nextflow scripts.
modules_config - SEQTK_SEQ_MASK found in conf/modules.config and Nextflow scripts.
modules_config - EXTRACT_UMIS found in conf/modules.config and Nextflow scripts.
modules_config - VSEARCH_CLUSTER found in conf/modules.config and Nextflow scripts.
modules_config - VSEARCH_SORT found in conf/modules.config and Nextflow scripts.
modules_config - PREPROCESSING_SUMMARY found in conf/modules.config and Nextflow scripts.
modules_config - MATRICESCREATION found in conf/modules.config and Nextflow scripts.
modules_config - MINIMAP2_ALIGN_UMI_1 found in conf/modules.config and Nextflow scripts.
modules_config - MINIMAP2_ALIGN_UMI_2 found in conf/modules.config and Nextflow scripts.
modules_config - RACON_1 found in conf/modules.config and Nextflow scripts.
modules_config - RACON_2 found in conf/modules.config and Nextflow scripts.
modules_config - MEDAKA found in conf/modules.config and Nextflow scripts.
modules_config - SEQTK_SEQ_FATOFQ found in conf/modules.config and Nextflow scripts.
modules_config - CLUSTERING_SUMMARY found in conf/modules.config and Nextflow scripts.
modules_config - MINIMAP2_ALIGN_ORIGINAL found in conf/modules.config and Nextflow scripts.
modules_config - ALIGNMENT_SUMMARY found in conf/modules.config and Nextflow scripts.
modules_config - SAMTOOLS_INDEX found in conf/modules.config and Nextflow scripts.
modules_config - TEMPLATE_REFERENCE found in conf/modules.config and Nextflow scripts.
modules_config - MINIMAP2_ALIGN_TEMPLATE found in conf/modules.config and Nextflow scripts.
modules_config - CIGAR_PARSER found in conf/modules.config and Nextflow scripts.
modules_config - CRISPRSEQ_PLOTTER found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 2.14.1

Run details

nf-core/tools version 2.14.1
Run at 2024-09-19 11:53:17

Sep 02 '24 11:09 github-actions[bot]

Hi @LaurenceKuhl do you know if someone can help me out with this? I still think that the template update messed up some things and I have been unable to get the pr back up and running. The gpt implementation seems to not be the cause and the fork contains some error which I am having trouble fixing as it points to files all over the pipeline. If you don't have the time that's also no problem 😄

Mar 02 '25 10:03 LeonHornich

Hello @LeonHornich, the changes in the dev branch replaced nf-validation by nf-schema, when resolving the conflicts while merging dev to this PR some duplications were added (see https://github.com/nf-core/crisprseq/pull/193/files#diff-61ea3cf4e947c6ba28f51817a80d81e89b8f806db0de65d874149cc075ac3c6dR47 for example). I haven't checked if there are more of this errors, but you can go through the code changes and make sure that we are only using nf-schema now.

Mar 05 '25 15:03 mirpedrol

It's looking good! I left some comments and suggestionts. Could you provide a screenshot of an example result? It would also be good to add it to the documentation.

Hi, thanks a lot for taking a look at it and giving some suggestions. I have looked over the feedback and it looks like most of it are smaller changes. It also looks like there are some changes to code I don't remember touching (pretty much everything that is not directly connected to my implementation (e.g. https://github.com/nf-core/crisprseq/pull/193#discussion_r2016875389 ). If I had to guess this is due to the template update and my initial issues with the changes coming with it :) . I will take a look at them and hopefully path everything back together. Sorry for the confusion on that end.

Mar 29 '25 19:03 LeonHornich

Hi @mirpedrol , I finally had some time and took care of your suggestions. Should be mostly done by now. Turns out I originally had some python code formatter installed that would mess up the internal structure of the files. That's turned off now.

I carried over almost all of your suggestions. The only thing I have been holding back on is the data parser. I don't think the data parser module would be suited for a nf-core module as it currently work and is tailored specifically for the target modules of the screening analysis. More or less. I do see the advantage of generating it using nf-core modules create but for the sake of finally merging this pr maybe I will have a look at this in the future. Let me know if you think otherwise then I can also look into it prior.

Lastly, I wanted to include the gpt outputs in the multiqc reports. This is a bit of a tricky one. While the output is a .txt file it does not quiet match the requirements that would allow me to use it in multiqc (from what I can see now). It is a raw text file and not tabular. Phil Ewels suggested to ask the gpt to generate a html formated output but due to the primitive structure of nf-gpt I don't think I can 100% enforce that 🤔 . I can only include that request in the query string. Not sure if that always yields a functional outcome. I would personally wait with this until either nf-gpt is developed further or maybe until multiqc supports the file? Again, would be interested in your opinion on this.

May 19 '25 15:05 LeonHornich

Hi @LeonHornich, I will have a last look to the PR today. I agree with you that we can merge this now and work on "nf-coreizing" the data parser in a separate PR. Regarding multiqc, I would rather be sure that the pipeline completes 100% of the times, than risking an error when we try to generate the HTML file and pass it to multiQC. What you could do is parse this txt file programmatically and format it in a way that is compatible with MultiQC. How does the text file look like?

May 20 '25 11:05 mirpedrol

looking at the failing tests. Does this PR change something in the multiqc_fastqc.txt file?

May 20 '25 12:05 mirpedrol

looking at the failing tests. Does this PR change something in the multiqc_fastqc.txt file?

It shouldn't, I am a little clueless as for why this is not passing

May 20 '25 12:05 LeonHornich

I re-run the tests and it seems to be consistent, so I would suggest to update the failing snapshots with the md5 that we see here 🤞

May 20 '25 14:05 mirpedrol

I re-run the tests and it seems to be consistent, so I would suggest to update the failing snapshots with the md5 that we see here 🤞

How and where can I do that? If this is something done on the github, I am not sure if I have the rights for it.

May 21 '25 11:05 LeonHornich

I have quickly done it by copying the md5 from the error message in the CI. You can always run nf-test with nf-test tests, both the tests and snapshot files are under the tests directory.

May 21 '25 12:05 mirpedrol

Hmm, seems like similar tests are failing here https://github.com/nf-core/crisprseq/pull/247 . So something on the dev branch might not be working?

May 22 '25 10:05 LeonHornich

Hi @LeonHornich, I solved the issue in #247 by skipping the "unstable" files in the snapshot. When we merge the code from that PR you can update your branch from dev, and this should also be enough to fix your failing checks.

May 27 '25 13:05 matbonfanti

Hi @LeonHornich, I solved the issue in #247 by skipping the "unstable" files in the snapshot. When we merge the code from that PR you can update your branch from dev, and this should also be enough to fix your failing checks.

Amazing, thank you for the fix and letting me know

May 27 '25 13:05 LeonHornich

@LeonHornich I did the merge from dev myself, because there was conflict in the snaphot to solve... finger crossed :-)

May 27 '25 13:05 matbonfanti

finally! feel free to merge...

Thanks for the great job!

May 27 '25 14:05 matbonfanti

Looks like everything passed. Thank you again @matbonfanti 👏🏻

May 27 '25 14:05 LeonHornich

I don't have permissions to merge this pr. Maybe @mirpedrol can do it? Thank you everyone, we can finally close this pr :)

May 27 '25 14:05 LeonHornich

I can do it, no problem!

May 27 '25 14:05 matbonfanti

Nice to see this merged! 👏 🚀

May 28 '25 07:05 mirpedrol

Implementation of nf-gpt into crisprseq pipeline.

nf-core lint overall result: Failed :x:

:x: Test failures:

:heavy_exclamation_mark: Test warnings:

:grey_question: Tests ignored:

:white_check_mark: Tests passed:

Run details

`nf-core lint` overall result: Failed :x: