FALCON_unzip icon indicating copy to clipboard operation
FALCON_unzip copied to clipboard

blasr 5.2 fundamentally incompatible with current FALCON_UNZIP?

Open ghost opened this issue 8 years ago • 27 comments

I compiled the newest blasr 5.2 on my debian. After some twinkering it worked fine.

Then I wanted to run FALCON_UNZIP (latest version).

First I figured out, that blasr 5.2 now uses two hyphens for its options. Most recent FALCON_UNZIP does not support this, it uses one hyphen. So I changed the respective python scripts. That worked then.

Second it tells me:

ERROR: --sam is no longer supported, use --bam, then translate from .bam to .sam when I call blasr.

So it seems, that I can not use blasr 5.2 with the most recent FALCON_UNZIP? Or is there an "easy" hack?

Otherwise I might have to compile blasr 5.1

ghost avatar Aug 08 '16 09:08 ghost

Currently, I only test the code against particular SMRTanalysis build. If you like, you can add the samtools step to convert the sam files to bam files. I need to discuss with other developer for long term solution before we decide what to do.

pb-jchin avatar Aug 08 '16 17:08 pb-jchin

OK many thanks. I will see how I solve the problem. So which specific version of blasr do you use then? Or which SMRTanalysis build?

ghost avatar Aug 09 '16 07:08 ghost

I guess you meant the SMRTanalysis build you mentioned in the FAQ? I am running it with this version of blasr now and it works like a charm.

ghost avatar Aug 11 '16 11:08 ghost

Two related questions.

#1 Is unzip also compatible with smrtlink_3.1.0.180439 as I am always getting the error 'Failure: The Quiver algorithm requires an alignment file containing standard (non-CCS) reads.' when running the variantCaller within the fc_quiver.py script. I converted my *.bax.h5 files with bax2bam beforehand.

#2 If not what is the best smrtlink build to use and where could I download it?

Thanks!

On Thu, Aug 11, 2016 at 9:56 PM, MichaelsGITIGIT [email protected] wrote:

I guess you meant the SMRTanalysis build you mentioned in the FAQ? I am running it with this version of blasr now and it works like a charm.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PacificBiosciences/FALCON_unzip/issues/29#issuecomment-239140704, or mute the thread https://github.com/notifications/unsubscribe-auth/AGLMhg_xGHI77HgWvFn01_RnzjYas3EWks5qew4KgaJpZM4Je4ck .

BenjaminSchwessinger avatar Aug 11 '16 12:08 BenjaminSchwessinger

Hi all, Due to a popular demand, --sam flag is back to Blasr (version 5.3) with a little twist, though : SAM output is now implemented via pbbam.

Blasr invocation hasn't changed, but it now must to be built with pbbam support and the best way to do it is to use pitchfork where pbbam is enabled by default.

https://github.com/search?q=org%3APacificBiosciences+pitchfork

Please, download our new environment and try re-run your tasks. I'll try to help in case of problems. SAM output might be slightly different, for example, order of some fields (it's now "bam-compliant", so to speak), but it shouldn't have any negative influence.

Regards, Vladimir

vrainish-pacbio avatar Sep 14 '16 00:09 vrainish-pacbio

If somebody has already used Blasr 5.3 version ( which has --sam option back), please, post results here.

vrainish-pacbio avatar Sep 20 '16 20:09 vrainish-pacbio

I'm using 5.3:

$ blasr --version blasr 5.3.061bd35

I've been running with this (above) blasr, installed from pitchfork (within the last two weeks), and I'm still getting errors because there are scripts calling blasr with a single dash for full word options ... ie. 'blasr [...] -hitPolicy [...]' ... which aren't accepted by blasr 5.3.

I've tried to find the template scripts which contain this text, and change them to use two dash characters, but I must be missing something; they keep coming back. Or is it possible some of these calls are from binary files? I'm not seeing complaints about '--sam' ... but I'm not sure that I wouldn't if I sorted out the single dash problem.

Sincerely, Sanity "Unzipping" In Davis

jfass avatar Oct 03 '16 16:10 jfass

Can you post the entire command ? with all the flags etc and the output

Please, take a look on blasr wiki page what info needed to be provided, when reporting a problem

https://github.com/PacificBiosciences/blasr/wiki

vrainish-pacbio avatar Oct 03 '16 21:10 vrainish-pacbio

Yah that was dumb - I should have posted it. It was just the call to blasr in unzip.py, but with single dashes instead of double dashes. I called 'fc_unzip.py fc_unzip.cfg' to get to that point, and to restart failed jobs. But in any event I got past that problem by replacing all the flags (i.e. '--hitPolicy' replacing '-hitPolicy') in the Python virtualenv that I'm using, in the unzip.py here:

~/Python_venv/lib/python2.7/site-packages/falcon_unzip-0.1.0-py2.7.egg/falcon_unzip/unzip.py

It just took me a while to find that particular copy of unzip.py.

Now I'm seeing a variety of other errors, some dealing with Perl libraries older than the current Perl in use, and some dealing with incorrect importing from future ...

I'm not sure where the FAQ (mentioned above) is. Does it spell out what versions are needed for everything FALCON_unzip needs to run?

jfass avatar Oct 04 '16 17:10 jfass

Hi Joseph,

Thanks for posting an additional info, When we migrated blasr command line notation from single to double dash, some scripts "escaped" the tests. I'll take a closer look on them. I am surprised it shows only now.

vrainish-pacbio avatar Oct 04 '16 17:10 vrainish-pacbio

Indeed, invocation of blasr in unzip.py in line 200 still uses old single dash notation. Fix is on it's way. Regarding SAM output, if you are using an updated version build with pitchfork ( pbbam support) blasr --sam should work with no problem as in 5.1 and before. Please, let me know if the whole pipeline with --sam works ( after you fix Perl problems ), so the whole issue can be closed

vrainish-pacbio avatar Oct 04 '16 20:10 vrainish-pacbio

I think the current sticking point is independent of blasr ... I'm getting errors in the calling hasm.sh, which I'll research and then post as a separate issue if I can't find anything on it. So I think, from my perspective, this problem's solved. (many thanks)

jfass avatar Oct 04 '16 21:10 jfass

I found another single dash, full word option issue ... in:

Python_venv/lib/python2.7/site-packages/falcon_unzip-0.1.0-py2.7.egg/falcon_unzip/run_quiver.py

on line 211, pbalign is called with almost all word options having two dashes, except for --useQuality:

--algorithmOptions=-useQuality --maxHits=1 --hitPolicy=random --seed=1\

... should be:

--algorithmOptions=--useQuality --maxHits=1 --hitPolicy=random --seed=1\

jfass avatar Oct 06 '16 18:10 jfass

Is there a recommended version of blasr to use with Falcon-unzip? In our SMRT analysis package we have blasr 1.3.1.142244 but when running a Falcon-unzip command it reports "ERROR: -noSplitSubreads is not a valid option."

I tried an install of the most recent blasr but encountered the bug described in this issue, with single/double dash inconsistencies.

tpshea2 avatar Nov 01 '16 19:11 tpshea2

It's easiest to use the latest blasr, built via pitchfork. We probably need to add the -- into FALCON-unzip. Could you submit a PR? I don't plan to start unzip integration for a couple weeks, and Jason is pretty busy.

pb-cdunn avatar Nov 01 '16 22:11 pb-cdunn

I am using the Pitchfork version and the error remains. SO running the blasr, it says that the -noSplitSubreads is not a valid option. Is there anyway we can fix this to --noSplitSubreads in the unzip code?

bostanict avatar Nov 30 '16 20:11 bostanict

I listed two locations that have this single-dash error, in comments above. I changed both and it didn't bother me after that, but it's possible I forgot one. If you fix both and still run into the problem, I'd be interested!

jfass avatar Nov 30 '16 22:11 jfass

Fixed it but when I ran it gave this error:

ERROR: -clippingis not a valid option.

so this should be also changed... I am not sure about other options. I will give it a try and see step by step :)

bostanict avatar Dec 01 '16 01:12 bostanict

OK, seems that all the -- is - in the unzip. All of them for blasr should be replaced with --.

bostanict avatar Dec 01 '16 01:12 bostanict

Ok. I will provide a setting to select double-dash.

But you could run into other problems. If you are running blasr from pitchfork, then you might also have other dependencies from pitchfork. FALCON-integrate chooses a consistent contour via git-submodules, so you could become confused about which version of which component you are actually using. That's why we have not yet begun to support more recent versions of smrttools with FALCON_unzip. So you will have to deal with those issues on your own as the support burden is too large. (Our Sequel customers receive a tarball of self-consistent versions, and we will soon include FALCON_unzip in that tarball, experimentally.)

Do you understand the problem I've outlined?

pb-cdunn avatar Dec 01 '16 15:12 pb-cdunn

Hi @pb-cdunn ,

Thanks for the reply. Concerning the versioning and other dependancy issues, at the moment I have changed the directory in the cfg file to Pitchfork bin since we are facing some system admin issues with the smartportal and access issues to the bin forlder. Anyway changing it to pitchfork as mentioned made it work perfectly for now. Ofcourse I changed all the - to -- in the both locations (blasr command in unzip).

Hopefully you will start the support and things become clearer and more consistent than before.

I have now these files before the polishing:

image

So I assume that the unzip version is done and I just need to run the quiver to polish both p and h contigs independently, Right?

Thanks a lot

bostanict avatar Dec 05 '16 13:12 bostanict

Yes, fc_quiver.py should work now.

pb-cdunn avatar Dec 05 '16 15:12 pb-cdunn

Thanks @pb-cdunn

Just one more thing to know, if I run quiver by myself on each of the p and H contigs independently and not the fc_quiver.py, is it the same? or there are things inside fc_quiver.py that are different?

Thanks in advacne~

bostanict avatar Dec 05 '16 17:12 bostanict

See falcon_unzip/run_quiver.py.

pb-cdunn avatar Dec 05 '16 17:12 pb-cdunn

Hi,

I am sorry to ask again but I went through the code and tried to understand but was not completely clear to me.

SO besides of -x 5 -X 120 -q 20 for quiver and --minAccuracy=0.75 --minLength=50 --minAnchorSize=12 --maxDivergence=30 --concordant --algorithm=blasr --algorithmOptions=-useQuality --maxHits=1 --hitPolicy=random --seed=1 for the pbalign, is there any difference in running the quiver using run_quiver or Quiver by itself.

I succeeded to run quiver on our genome successfully by itself but run_quiver.py runs into several errors. So is it advised to run unzip first to separate the haplotypes and then run quiver separately to polish each of the P and H files?

Thanks a lot

bostanict avatar Dec 08 '16 13:12 bostanict

Since you are using Falcon, I assume you have a diploid genome to assemble. Quiver itself is not designed for diploid assembly so that the results of polishing diploid assembly using quiver is not guaranteed (What happens when the sample is a mixture, or diploid? https://github.com/PacificBiosciences/GenomicConsensus/blob/master/doc/FAQ.rst). Think that if your are polishing two haplotypes separately, you may actually use reads from one haplotype to polish another haplotype. Falocn seems can track reads used for assembling each haplotypes and run_quiver.py use reads of each haplotypes to polish each haplotype separately, thus giveing better results. I'm also a user of Falcon and I have also encountered this question before. I finally managed to polish using run_quiver and it actually gives better results. Hope that my answer helps. Above are my understanding of Falcon and quiver and please point it out if I'm wrong.

danshu avatar Dec 08 '16 14:12 danshu

Dear Danshu,

Thanks a lot for the hints. That was exactly what I was looking for to see if run_quiver.py is tweaked and different from running quiver itself. Yes we have a diploid and highly hetrozygous genome and it was going to be my next question that how quiver correct and polish the contigs if using the other haplotype info (answered now, THANKS)

bostanict avatar Dec 08 '16 14:12 bostanict