fusion-report icon indicating copy to clipboard operation
fusion-report copied to clipboard

database choice

Open tijeco opened this issue 3 years ago • 1 comments

Database choice

The current version doesn't seem to necessarily allow for choosing which database to download / use as mentioned in #48 , so I have drafted this PR.

My goal is that the database used would be explicitly declared, so each database has its own flag added to the arguments.json file for run and download. The idea being that to just download mitelman, you could run fusion_report download --use_mitelman true database_output if you just wanted to download that one database, or any combination of --use_cosmic, --use_mitelman, --use_fusiongdb and --use_fusiongdb2. For my purposes, I only wanted to download mitelman, fusiongdb and fusiongdb2. So I can now run the following:

fusion_report download  --use_mitelman true --use_fusiongdb true --use_fusiongdb2 true fusionreport_download

Further, to run on the test dataset, I can use the following:

fusion_report run "test" test_output fusionreport_download/ \
  --use_mitelman true --use_fusiongdb true --use_fusiongdb2 true \
  --arriba tests/test_data/arriba.tsv \
  --dragen tests/test_data/dragen.tsv \
  --ericscript tests/test_data/ericscript.tsv \
  --fusioncatcher tests/test_data/fusioncatcher.txt \
  --pizzly tests/test_data/pizzly.tsv \
  --squid tests/test_data/squid.txt \
  --starfusion tests/test_data/starfusion.tsv \
  --jaffa tests/test_data/jaffa.csv \
  --allow-multiple-gene-symbols

I also included a conda environment file, which I included as I used it with a jupyter notebook to play around with the library, so I thought it might be useful as well.

Let me know what you think.

Checklist

  • [x] Specify in detail the change
  • [ ] Make sure to follow guidelines in docs when adding database/tool
  • [ ] Documentation in docs is updated
  • [ ] CHANGELOG.md is updated
  • [ ] README is updated

tijeco avatar May 27 '22 21:05 tijeco

Hi @tijeco, I understand why would you prefer to choose your own databases. We made it initially with idea of using all of them because otherwise you have to specify a weight for each database separately.

matq007 avatar Jun 09 '22 09:06 matq007

It is really nice, I implemented similar options, just from the negative, so you would need to specify the databases you don't want instead of the ones you want.

rannick avatar Oct 04 '24 07:10 rannick

https://github.com/Clinical-Genomics/fusion-report/pull/77

rannick avatar Oct 04 '24 07:10 rannick