FALCON_unzip
FALCON_unzip copied to clipboard
blasr 5.2 fundamentally incompatible with current FALCON_UNZIP?
I compiled the newest blasr 5.2 on my debian. After some twinkering it worked fine.
Then I wanted to run FALCON_UNZIP (latest version).
First I figured out, that blasr 5.2 now uses two hyphens for its options. Most recent FALCON_UNZIP does not support this, it uses one hyphen. So I changed the respective python scripts. That worked then.
Second it tells me:
ERROR: --sam is no longer supported, use --bam, then translate from .bam to .sam when I call blasr.
So it seems, that I can not use blasr 5.2 with the most recent FALCON_UNZIP? Or is there an "easy" hack?
Otherwise I might have to compile blasr 5.1
Currently, I only test the code against particular SMRTanalysis build. If you like, you can add the samtools
step to convert the sam
files to bam
files. I need to discuss with other developer for long term solution before we decide what to do.
OK many thanks. I will see how I solve the problem.
So which specific version of blasr
do you use then? Or which SMRTanalysis build?
I guess you meant the SMRTanalysis build you mentioned in the FAQ? I am running it with this version of blasr now and it works like a charm.
Two related questions.
#1 Is unzip also compatible with smrtlink_3.1.0.180439 as I am always getting the error 'Failure: The Quiver algorithm requires an alignment file containing standard (non-CCS) reads.' when running the variantCaller within the fc_quiver.py script. I converted my *.bax.h5 files with bax2bam beforehand.
#2 If not what is the best smrtlink build to use and where could I download it?
Thanks!
On Thu, Aug 11, 2016 at 9:56 PM, MichaelsGITIGIT [email protected] wrote:
I guess you meant the SMRTanalysis build you mentioned in the FAQ? I am running it with this version of blasr now and it works like a charm.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PacificBiosciences/FALCON_unzip/issues/29#issuecomment-239140704, or mute the thread https://github.com/notifications/unsubscribe-auth/AGLMhg_xGHI77HgWvFn01_RnzjYas3EWks5qew4KgaJpZM4Je4ck .
Hi all, Due to a popular demand, --sam flag is back to Blasr (version 5.3) with a little twist, though : SAM output is now implemented via pbbam.
Blasr invocation hasn't changed, but it now must to be built with pbbam support and the best way to do it is to use pitchfork where pbbam is enabled by default.
https://github.com/search?q=org%3APacificBiosciences+pitchfork
Please, download our new environment and try re-run your tasks. I'll try to help in case of problems. SAM output might be slightly different, for example, order of some fields (it's now "bam-compliant", so to speak), but it shouldn't have any negative influence.
Regards, Vladimir
If somebody has already used Blasr 5.3 version ( which has --sam option back), please, post results here.
I'm using 5.3:
$ blasr --version blasr 5.3.061bd35
I've been running with this (above) blasr, installed from pitchfork (within the last two weeks), and I'm still getting errors because there are scripts calling blasr with a single dash for full word options ... ie. 'blasr [...] -hitPolicy [...]' ... which aren't accepted by blasr 5.3.
I've tried to find the template scripts which contain this text, and change them to use two dash characters, but I must be missing something; they keep coming back. Or is it possible some of these calls are from binary files? I'm not seeing complaints about '--sam' ... but I'm not sure that I wouldn't if I sorted out the single dash problem.
Sincerely, Sanity "Unzipping" In Davis
Can you post the entire command ? with all the flags etc and the output
Please, take a look on blasr wiki page what info needed to be provided, when reporting a problem
https://github.com/PacificBiosciences/blasr/wiki
Yah that was dumb - I should have posted it. It was just the call to blasr in unzip.py, but with single dashes instead of double dashes. I called 'fc_unzip.py fc_unzip.cfg' to get to that point, and to restart failed jobs. But in any event I got past that problem by replacing all the flags (i.e. '--hitPolicy' replacing '-hitPolicy') in the Python virtualenv that I'm using, in the unzip.py here:
~/Python_venv/lib/python2.7/site-packages/falcon_unzip-0.1.0-py2.7.egg/falcon_unzip/unzip.py
It just took me a while to find that particular copy of unzip.py.
Now I'm seeing a variety of other errors, some dealing with Perl libraries older than the current Perl in use, and some dealing with incorrect importing from future ...
I'm not sure where the FAQ (mentioned above) is. Does it spell out what versions are needed for everything FALCON_unzip needs to run?
Hi Joseph,
Thanks for posting an additional info, When we migrated blasr command line notation from single to double dash, some scripts "escaped" the tests. I'll take a closer look on them. I am surprised it shows only now.
Indeed, invocation of blasr in unzip.py in line 200 still uses old single dash notation. Fix is on it's way. Regarding SAM output, if you are using an updated version build with pitchfork ( pbbam support) blasr --sam should work with no problem as in 5.1 and before. Please, let me know if the whole pipeline with --sam works ( after you fix Perl problems ), so the whole issue can be closed
I think the current sticking point is independent of blasr ... I'm getting errors in the calling hasm.sh, which I'll research and then post as a separate issue if I can't find anything on it. So I think, from my perspective, this problem's solved. (many thanks)
I found another single dash, full word option issue ... in:
Python_venv/lib/python2.7/site-packages/falcon_unzip-0.1.0-py2.7.egg/falcon_unzip/run_quiver.py
on line 211, pbalign is called with almost all word options having two dashes, except for --useQuality:
--algorithmOptions=-useQuality --maxHits=1 --hitPolicy=random --seed=1\
... should be:
--algorithmOptions=--useQuality --maxHits=1 --hitPolicy=random --seed=1\
Is there a recommended version of blasr to use with Falcon-unzip? In our SMRT analysis package we have blasr 1.3.1.142244 but when running a Falcon-unzip command it reports "ERROR: -noSplitSubreads is not a valid option."
I tried an install of the most recent blasr but encountered the bug described in this issue, with single/double dash inconsistencies.
It's easiest to use the latest blasr, built via pitchfork. We probably need to add the --
into FALCON-unzip. Could you submit a PR? I don't plan to start unzip integration for a couple weeks, and Jason is pretty busy.
I am using the Pitchfork version and the error remains. SO running the blasr, it says that the -noSplitSubreads is not a valid option. Is there anyway we can fix this to --noSplitSubreads in the unzip code?
I listed two locations that have this single-dash error, in comments above. I changed both and it didn't bother me after that, but it's possible I forgot one. If you fix both and still run into the problem, I'd be interested!
Fixed it but when I ran it gave this error:
ERROR: -clippingis not a valid option.
so this should be also changed... I am not sure about other options. I will give it a try and see step by step :)
OK, seems that all the -- is - in the unzip. All of them for blasr should be replaced with --.
Ok. I will provide a setting to select double-dash.
But you could run into other problems. If you are running blasr from pitchfork, then you might also have other dependencies from pitchfork. FALCON-integrate chooses a consistent contour via git-submodules, so you could become confused about which version of which component you are actually using. That's why we have not yet begun to support more recent versions of smrttools with FALCON_unzip. So you will have to deal with those issues on your own as the support burden is too large. (Our Sequel customers receive a tarball of self-consistent versions, and we will soon include FALCON_unzip in that tarball, experimentally.)
Do you understand the problem I've outlined?
Hi @pb-cdunn ,
Thanks for the reply. Concerning the versioning and other dependancy issues, at the moment I have changed the directory in the cfg file to Pitchfork bin since we are facing some system admin issues with the smartportal and access issues to the bin forlder. Anyway changing it to pitchfork as mentioned made it work perfectly for now. Ofcourse I changed all the - to -- in the both locations (blasr command in unzip).
Hopefully you will start the support and things become clearer and more consistent than before.
I have now these files before the polishing:
So I assume that the unzip version is done and I just need to run the quiver to polish both p and h contigs independently, Right?
Thanks a lot
Yes, fc_quiver.py
should work now.
Thanks @pb-cdunn
Just one more thing to know, if I run quiver by myself on each of the p and H contigs independently and not the fc_quiver.py
, is it the same? or there are things inside fc_quiver.py
that are different?
Thanks in advacne~
See falcon_unzip/run_quiver.py
.
Hi,
I am sorry to ask again but I went through the code and tried to understand but was not completely clear to me.
SO besides of -x 5 -X 120 -q 20
for quiver and --minAccuracy=0.75 --minLength=50 --minAnchorSize=12 --maxDivergence=30 --concordant --algorithm=blasr --algorithmOptions=-useQuality --maxHits=1 --hitPolicy=random --seed=1
for the pbalign, is there any difference in running the quiver using run_quiver or Quiver by itself.
I succeeded to run quiver on our genome successfully by itself but run_quiver.py runs into several errors. So is it advised to run unzip first to separate the haplotypes and then run quiver separately to polish each of the P and H files?
Thanks a lot
Since you are using Falcon, I assume you have a diploid genome to assemble. Quiver itself is not designed for diploid assembly so that the results of polishing diploid assembly using quiver is not guaranteed (What happens when the sample is a mixture, or diploid? https://github.com/PacificBiosciences/GenomicConsensus/blob/master/doc/FAQ.rst). Think that if your are polishing two haplotypes separately, you may actually use reads from one haplotype to polish another haplotype. Falocn seems can track reads used for assembling each haplotypes and run_quiver.py use reads of each haplotypes to polish each haplotype separately, thus giveing better results. I'm also a user of Falcon and I have also encountered this question before. I finally managed to polish using run_quiver and it actually gives better results. Hope that my answer helps. Above are my understanding of Falcon and quiver and please point it out if I'm wrong.
Dear Danshu,
Thanks a lot for the hints. That was exactly what I was looking for to see if run_quiver.py is tweaked and different from running quiver itself. Yes we have a diploid and highly hetrozygous genome and it was going to be my next question that how quiver correct and polish the contigs if using the other haplotype info (answered now, THANKS)