tools-iuc
tools-iuc copied to clipboard
Request for tool wrap for BBtools
We are trying to deploy our pipeline on Galaxy so more people can use it, and we would need to use the BBtools suite. If it can be wrapped into Galaxy, that would be greatly appreciated!
@xinhuang66 is there any particular functionality you want to BBtools? This could help focus work on a wrapper.
Hi, thank you for getting back to me. We would love to use bbduk.sh, bbmap.sh and callvariants.sh in the BBtools suite.
Thank you!
Xin Huang, PhD | Senior Bioinformatician Discovery Research | Gene Therapy Programhttps://gtp.med.upenn.edu/ Perelman School of Medicine, University of Pennsylvania
Check us out on LinkedInhttps://www.linkedin.com/company/gene-therapy-program-perelman-school-of-medicine-university-of-pennsylvania?trk=ppro_cprof
CONFIDENTIALITY NOTICE This e-mail message and any documents accompanying this e-mail transmission contain confidential and/or proprietary information from the University of Pennsylvania and shall not be used, disclosed, or reproduced, in whole or in part, without the prior written consent of the University of Pennsylvania. Title to this document and all information contained herein remains at all times in The University of Pennsylvania. If you have received this transmission in error, please reply to the sender advising of the error and delete the message and any accompanying documents from your system immediately.
From: pvanheus @.> Sent: Thursday, May 20, 2021 1:21 PM To: galaxyproject/tools-iuc @.> Cc: Huang, Xin @.>; Mention @.> Subject: Re: [galaxyproject/tools-iuc] Request for tool wrap for BBtools (#3682)
@xinhuang66https://github.com/xinhuang66 is there any particular functionality you want to BBtools? This could help focus work on a wrapper.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/galaxyproject/tools-iuc/issues/3682#issuecomment-845310026, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AR2UTM2CX53T46W4JFQT7LLTOVAHNANCNFSM45EYUN5Q.
I'll work on these - please let me know if work has already started on them.
Thank you very much, Greg!
All the best,
Xin Huang, PhD | Senior Bioinformatician Discovery Research | Gene Therapy Programhttps://gtp.med.upenn.edu/ Perelman School of Medicine, University of Pennsylvania
Check us out on LinkedInhttps://www.linkedin.com/company/gene-therapy-program-perelman-school-of-medicine-university-of-pennsylvania?trk=ppro_cprof
CONFIDENTIALITY NOTICE This e-mail message and any documents accompanying this e-mail transmission contain confidential and/or proprietary information from the University of Pennsylvania and shall not be used, disclosed, or reproduced, in whole or in part, without the prior written consent of the University of Pennsylvania. Title to this document and all information contained herein remains at all times in The University of Pennsylvania. If you have received this transmission in error, please reply to the sender advising of the error and delete the message and any accompanying documents from your system immediately.
From: Greg Von Kuster @.> Sent: Wednesday, September 22, 2021 1:52 PM To: galaxyproject/tools-iuc @.> Cc: Huang, Xin @.>; Mention @.> Subject: Re: [galaxyproject/tools-iuc] Request for tool wrap for BBtools (#3682)
I'll work on these - please let me know if work has already started on them.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/galaxyproject/tools-iuc/issues/3682*issuecomment-925150170__;Iw!!IBzWLUs!FTywpRiFnanq3SkpFPpvE4GP2XpXlsKdEKbcj_Pzwi8-580PrGU-W7LYpVpMUlGxEgY$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AR2UTM5DTGKO7CIVHT4QEHTUDIJUJANCNFSM45EYUN5Q__;!!IBzWLUs!FTywpRiFnanq3SkpFPpvE4GP2XpXlsKdEKbcj_Pzwi8-580PrGU-W7LYpVpMjgdeVEY$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.com/v3/__https:/apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!IBzWLUs!FTywpRiFnanq3SkpFPpvE4GP2XpXlsKdEKbcj_Pzwi8-580PrGU-W7LYpVpMmJojkGs$ or Androidhttps://urldefense.com/v3/__https:/play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!IBzWLUs!FTywpRiFnanq3SkpFPpvE4GP2XpXlsKdEKbcj_Pzwi8-580PrGU-W7LYpVpMadbJGcM$.
Here is some text from the BBMap docs here https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmap-guide/ This implies that we'll need a new BBmap data manager tool to install indexes, so just want to confirm that this is where I should be starting. Thoughts from the IUC?
BBMap must index a reference before mapping to it, which is relatively fast. By default, it will write this index to disk so that it can be loaded more quickly next time, but this can be suppressed with the “nodisk” flag. The index is written to the location /ref/. In other words, if you run BBMap from the location /bob/work/, then the directory /bob/work/ref/ will be created and an index written to it; if there is already an index at that location which matches the reference you are using, the existing index will be loaded. If it does not match, a new index will be written.
Not sure if this is what’s being asked, but BBmap can build the index itself, which can then be saved (bbmap.sh in=reads.fq ref=A.fa) or just generated on the fly without being saved to disk (bbmap.sh in=reads.fq ref=A.fa nodisk).
Thanks,
Xin Huang, PhD | Senior Bioinformatician Discovery Research | Gene Therapy Programhttps://gtp.med.upenn.edu/ Perelman School of Medicine, University of Pennsylvania
Check us out on LinkedInhttps://www.linkedin.com/company/gene-therapy-program-perelman-school-of-medicine-university-of-pennsylvania?trk=ppro_cprof
CONFIDENTIALITY NOTICE This e-mail message and any documents accompanying this e-mail transmission contain confidential and/or proprietary information from the University of Pennsylvania and shall not be used, disclosed, or reproduced, in whole or in part, without the prior written consent of the University of Pennsylvania. Title to this document and all information contained herein remains at all times in The University of Pennsylvania. If you have received this transmission in error, please reply to the sender advising of the error and delete the message and any accompanying documents from your system immediately.
From: Greg Von Kuster @.> Sent: Thursday, September 23, 2021 9:59 AM To: galaxyproject/tools-iuc @.> Cc: Huang, Xin @.>; Mention @.> Subject: Re: [galaxyproject/tools-iuc] Request for tool wrap for BBtools (#3682)
Here is some text from the BBMap docs here https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmap-guide/https://urldefense.com/v3/__https:/jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmap-guide/__;!!IBzWLUs!BC6J5tMlcnHOq_dUuMo6TA7MjqBGIcoOgg2KXZfNwmj4F0UW1dXa38xuOIyXCLPDAZQ$ This implies that we'll need a new BBmap data manager tool to install indexes, so just want to confirm that this is where I should be starting. Thoughts from the IUC?
BBMap must index a reference before mapping to it, which is relatively fast. By default, it will write this index to disk so that it can be loaded more quickly next time, but this can be suppressed with the “nodisk” flag. The index is written to the location /ref/. In other words, if you run BBMap from the location /bob/work/, then the directory /bob/work/ref/ will be created and an index written to it; if there is already an index at that location which matches the reference you are using, the existing index will be loaded. If it does not match, a new index will be written.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/galaxyproject/tools-iuc/issues/3682*issuecomment-925842262__;Iw!!IBzWLUs!BC6J5tMlcnHOq_dUuMo6TA7MjqBGIcoOgg2KXZfNwmj4F0UW1dXa38xuOIyXWaV5hFU$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AR2UTM2TT5NMJLJG5UZB47DUDMXDFANCNFSM45EYUN5Q__;!!IBzWLUs!BC6J5tMlcnHOq_dUuMo6TA7MjqBGIcoOgg2KXZfNwmj4F0UW1dXa38xuOIyX9EZ2Hck$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.com/v3/__https:/apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!IBzWLUs!BC6J5tMlcnHOq_dUuMo6TA7MjqBGIcoOgg2KXZfNwmj4F0UW1dXa38xuOIyXahmzxas$ or Androidhttps://urldefense.com/v3/__https:/play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!IBzWLUs!BC6J5tMlcnHOq_dUuMo6TA7MjqBGIcoOgg2KXZfNwmj4F0UW1dXa38xuOIyXQloKv2c$.
@xinhuang66 it looks like you're using an email client or something to post comments here, and this is creating a lot of unwanted signature text as well as the initial comment to which you are responding. Can you comment directly here to keep the unwanted text from the conversation?
Yes, I've seen that, but I want to confirm the IUC's stance on this - I don't think we want an index rebuilt each time the tool is executed. I was wondering what the indexes look like, so I just built one from a Fasta reference. This command...
$ bbmap.sh in=03-1057_S10_L001_R1_001.fastq.gz ref=NC_002945v4.fasta
...will build a directory structure with 2 folders (genome and index folders) and multiple files in addition to the index itself, which looks like this:
chr1_index_k13_c8_b1.block
chr1_index_k13_c8_b1.block2.gz
Executing the same command a 2nd time will display this comment:
NOTE: Ignoring reference file because it already appears to have been processed.
NOTE: If you wish to regenerate the index, please manually delete ref/genome/1/summary.txt
So it seems that we may have to link a fake `/ref/genome/1/summary.txt file into the current job working directory so tool execution will not rebuild the index. That is if the IUC would advise the data manager tool in this case rather then having the index built for each tool execution.
How long does it take to build the index ? For some recent mappers this is completely negligible so we don't for instance have a tool data table for minimap2.
Before you invest a lot of time, what is the license of these tools ? I remember this was an academic only license many years ago.
I'm sorry for the unwanted text. Thank you for working on this! If it will only be used by our group, please hold off on the development, because we are not likely to migrate our whole pipeline onto Galaxy in the short term. I made this request when we were trying to test something in the summer for interns.
@mvdbeek Yeah, that's what I was wondering, so thanks!
Building the index seems to be pretty fast. Here is the time to build the index for 2 different references, the size of each is displayed.
$ ll NC_002945v4.fasta
-rw-rw-r-- 1 greg greg 4412149 Jan 9 2020 NC_002945v4.fasta
Loaded Reference: 0.003 seconds.
Loading index for chunk 1-1, build 1
No index available; generating from reference genome: /home/greg/work/kapur/tmp/bbmap/ref/index/1/chr1_index_k13_c8_b1.block
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index: 1.497 seconds.
Analyzed Index: 2.316 seconds.
Cleared Memory: 0.118 seconds.
Processing reads in single-ended mode.
$ ll hg38.fa
-rw-r--r-- 1 greg greg 3273481604 Sep 23 11:30 hg38.fa
Loaded Reference: 0.011 seconds.
Loading index for chunk 1-8, build 1
No index available; generating from reference genome: ~/tmp/bbmap/ref/index/1/chr1-3_index_k13_c2_b1.block
No index available; generating from reference genome: ~/tmp/bbmap/ref/index/1/chr8_index_k13_c2_b1.block
No index available; generating from reference genome: ~/tmp/bbmap/ref/index/1/chr4-7_index_k13_c2_b1.block
Indexing threads started for block 0-3
Indexing threads started for block 4-7
Indexing threads started for block 8-8
Indexing threads finished for block 8-8
Indexing threads finished for block 0-3
Indexing threads finished for block 4-7
Generated Index: 177.579 seconds.
Finished Writing: 3.780 seconds.
Analyzed Index: 6.756 seconds.
Cleared Memory: 0.130 seconds.
Processing reads in single-ended mode.
So should I go ahead and just allow the tool to build the index with each exception, saving us from maintaining another data manager tool?
By the way, I ran this on my Linux desktop, not a high-end compute node.
The BBmap package is already available in the bioconda channel.
Also, the license for the tools is here https://github.com/BioInfoTools/BBMap/blob/master/license.txt, and is open source based on conditions. Here is the text.
BBTools Copyright (c) 2014, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
(3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code ("Enhancements") to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.
First tool - BBMap is here https://github.com/galaxyproject/tools-iuc/pull/3993
Awesome, thank you!
callvariants is now available. https://github.com/galaxyproject/tools-iuc/pull/4006
bbduk tool is here https://github.com/galaxyproject/tools-iuc/pull/4017
Fantastic, thank you!
Also BNorm has been requested.