crisprseq icon indicating copy to clipboard operation
crisprseq copied to clipboard

Implementation of nf-gpt into crisprseq pipeline.

Open LeonHornich opened this issue 1 year ago • 5 comments

This is a very early stage implementation of the nf-gpt plugin into the pipeline. It can currently handles a list of genes extracted from the drugZ, bagel2 or mle module output and parses it to the plugin.

You can run the changes using: export OPENAI_API_KEY=your-api-key nextflow run . -profile test_screening,docker --outdir test -dump-channels --gpt_interpretation drugz,mle,bagel2,rra -resume

DISCLAIMER! You will need a functioning Open AI api key. for this to work.

Since this is very early version and only comes with the bare minimum of functionality there are a lot of adjustment that have to me made (in order of priority):

  • integrate results into MULTIQC report!
  • update local module with nf-core template
  • add check for gene amount

Keep in mind this implementation does not aim to achieve 100% correct results from the gpt model, but instead is to build a foundational implementation for (future) LLMs into a nf-core pipeline.

If you have any suggestions or requested changes please let me know and i will try to adjust the code.

LeonHornich avatar Sep 02 '24 11:09 LeonHornich

[!WARNING] Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.0.2. Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

github-actions[bot] avatar Sep 02 '24 11:09 github-actions[bot]

nf-core lint overall result: Failed :x:

Posted for pipeline commit 05c4d56

+| ✅ 235 tests passed       |+
#| ❔   3 tests were ignored |#
!| ❗   5 tests had warnings |!
-| ❌   1 tests failed       |-

:x: Test failures:

  • actions_ci - Minimum pipeline NF version '23.04.0' is not tested in '.github/workflows/ci.yml'

:heavy_exclamation_mark: Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 2.3.0
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

:grey_question: Tests ignored:

  • files_exist - File is ignored: conf/test.config
  • files_exist - File is ignored: conf/test_full.config
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md

:white_check_mark: Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-09-19 11:53:17

github-actions[bot] avatar Sep 02 '24 11:09 github-actions[bot]

Hi @LaurenceKuhl do you know if someone can help me out with this? I still think that the template update messed up some things and I have been unable to get the pr back up and running. The gpt implementation seems to not be the cause and the fork contains some error which I am having trouble fixing as it points to files all over the pipeline. If you don't have the time that's also no problem 😄

LeonHornich avatar Mar 02 '25 10:03 LeonHornich

Hello @LeonHornich, the changes in the dev branch replaced nf-validation by nf-schema, when resolving the conflicts while merging dev to this PR some duplications were added (see https://github.com/nf-core/crisprseq/pull/193/files#diff-61ea3cf4e947c6ba28f51817a80d81e89b8f806db0de65d874149cc075ac3c6dR47 for example). I haven't checked if there are more of this errors, but you can go through the code changes and make sure that we are only using nf-schema now.

mirpedrol avatar Mar 05 '25 15:03 mirpedrol

It's looking good! I left some comments and suggestionts. Could you provide a screenshot of an example result? It would also be good to add it to the documentation.

Hi, thanks a lot for taking a look at it and giving some suggestions. I have looked over the feedback and it looks like most of it are smaller changes. It also looks like there are some changes to code I don't remember touching (pretty much everything that is not directly connected to my implementation (e.g. https://github.com/nf-core/crisprseq/pull/193#discussion_r2016875389 ). If I had to guess this is due to the template update and my initial issues with the changes coming with it :) . I will take a look at them and hopefully path everything back together. Sorry for the confusion on that end.

LeonHornich avatar Mar 29 '25 19:03 LeonHornich

Hi @mirpedrol , I finally had some time and took care of your suggestions. Should be mostly done by now. Turns out I originally had some python code formatter installed that would mess up the internal structure of the files. That's turned off now.

I carried over almost all of your suggestions. The only thing I have been holding back on is the data parser. I don't think the data parser module would be suited for a nf-core module as it currently work and is tailored specifically for the target modules of the screening analysis. More or less. I do see the advantage of generating it using nf-core modules create but for the sake of finally merging this pr maybe I will have a look at this in the future. Let me know if you think otherwise then I can also look into it prior.

Lastly, I wanted to include the gpt outputs in the multiqc reports. This is a bit of a tricky one. While the output is a .txt file it does not quiet match the requirements that would allow me to use it in multiqc (from what I can see now). It is a raw text file and not tabular. Phil Ewels suggested to ask the gpt to generate a html formated output but due to the primitive structure of nf-gpt I don't think I can 100% enforce that 🤔 . I can only include that request in the query string. Not sure if that always yields a functional outcome. I would personally wait with this until either nf-gpt is developed further or maybe until multiqc supports the file? Again, would be interested in your opinion on this.

LeonHornich avatar May 19 '25 15:05 LeonHornich

Hi @LeonHornich, I will have a last look to the PR today. I agree with you that we can merge this now and work on "nf-coreizing" the data parser in a separate PR. Regarding multiqc, I would rather be sure that the pipeline completes 100% of the times, than risking an error when we try to generate the HTML file and pass it to multiQC. What you could do is parse this txt file programmatically and format it in a way that is compatible with MultiQC. How does the text file look like?

mirpedrol avatar May 20 '25 11:05 mirpedrol

looking at the failing tests. Does this PR change something in the multiqc_fastqc.txt file?

mirpedrol avatar May 20 '25 12:05 mirpedrol

looking at the failing tests. Does this PR change something in the multiqc_fastqc.txt file?

It shouldn't, I am a little clueless as for why this is not passing

LeonHornich avatar May 20 '25 12:05 LeonHornich

I re-run the tests and it seems to be consistent, so I would suggest to update the failing snapshots with the md5 that we see here 🤞

mirpedrol avatar May 20 '25 14:05 mirpedrol

I re-run the tests and it seems to be consistent, so I would suggest to update the failing snapshots with the md5 that we see here 🤞

How and where can I do that? If this is something done on the github, I am not sure if I have the rights for it.

LeonHornich avatar May 21 '25 11:05 LeonHornich

I have quickly done it by copying the md5 from the error message in the CI. You can always run nf-test with nf-test tests, both the tests and snapshot files are under the tests directory.

mirpedrol avatar May 21 '25 12:05 mirpedrol

Hmm, seems like similar tests are failing here https://github.com/nf-core/crisprseq/pull/247 . So something on the dev branch might not be working?

LeonHornich avatar May 22 '25 10:05 LeonHornich

Hi @LeonHornich, I solved the issue in #247 by skipping the "unstable" files in the snapshot. When we merge the code from that PR you can update your branch from dev, and this should also be enough to fix your failing checks.

matbonfanti avatar May 27 '25 13:05 matbonfanti

Hi @LeonHornich, I solved the issue in #247 by skipping the "unstable" files in the snapshot. When we merge the code from that PR you can update your branch from dev, and this should also be enough to fix your failing checks.

Amazing, thank you for the fix and letting me know

LeonHornich avatar May 27 '25 13:05 LeonHornich

@LeonHornich I did the merge from dev myself, because there was conflict in the snaphot to solve... finger crossed :-)

matbonfanti avatar May 27 '25 13:05 matbonfanti

finally! feel free to merge...

Thanks for the great job!

matbonfanti avatar May 27 '25 14:05 matbonfanti

Looks like everything passed. Thank you again @matbonfanti 👏🏻

LeonHornich avatar May 27 '25 14:05 LeonHornich

I don't have permissions to merge this pr. Maybe @mirpedrol can do it? Thank you everyone, we can finally close this pr :)

LeonHornich avatar May 27 '25 14:05 LeonHornich

I can do it, no problem!

matbonfanti avatar May 27 '25 14:05 matbonfanti

Nice to see this merged! 👏 🚀

mirpedrol avatar May 28 '25 07:05 mirpedrol