C3POa
C3POa copied to clipboard
Number of repeats used for consensus
I have generated consensus sequences for different datasets using C3POa. I am trying to do some stats by stablishing a correlation between the number of subreads and the accuracy of the consensus. When I am splitting the output file based on the information present in the header of each consensus sequence in the C3POa output, I have noticed that there is a jump from "1" to "3" without any sequences with "2" in all my output files. I have checked my input file and I have data that should fall into the "2" category. I am not sure why this is happening or If I am misunderstanding the output file. Thanks! for your assistance.
What version of C3POa are you using? If you're running something older, I suggest updating to the latest version (v2.2.2). I haven't seen this come up in my test dataset. This is what I see when I plot out the accuracy per coverage bin:
I think what's probably happening is there's a bug in the consensus script that's used for pairwise consensus calling. As far as I know, there shouldn't be any problems with it in the most updated version. If you're on the latest C3POa version and you're still not seeing any reads with a coverage of 2, add .get()
to the apply_async
call on line 247. This will disable threading for the consensus calling and it will actually show you the errors.
Thanks for your answer. I am not using the latest version of C3POa. I was trying to install the latest version but it seems that I have an issue installing "pyabpoa". When I use any of the two commands that you indicate to install the different packages I get the following warning (I am sorry if its something simple, I am fairly new to this. I am using Ubuntu 18.04):
Do you have Cython installed? pip3 install --user Cython
should do the trick. To cover all of your bases, try pip3 install --user --upgrade Cython setuptools wheel
. Then you can try to install pyabpoa using pip. If that doesn't work, you can clone the abPOA repo and run make install_py
Thanks for the suggestions. Installation worked properly! I have started running some data that I ran on previous versions but I am having some issues.
When I use -q to filter the input file this warning is displayed:
When I remove -q, C3POa starts running but the ran finishes only after a few minutes (this is really fast compared to the previous version in which the same data set takes a few hours). When I checked the output, the "R2C2_consensus. fasta" file is really small with only a few sequences. The log file shows that only a few sequences are actually filtered compared to the total number of sequences (I have filtered sequences by size previously):
This is the command line I am using to run C3POa
Once again thank you for your assistance
Can you follow the debug step seen here: https://github.com/rvolden/C3POa/issues/17#issuecomment-783469536
For some reason python multiprocessing doesn't like passing back errors, so it'll just die silently instead of complaining.
I followed the debug step. The following error was displayed:
Seems to be a problem with pyabpoa, can you verify that your install is working correctly? It may have installed but it could still run into runtime errors
Thanks for your help. Indeed the problem was with pyabpoa install. Now C3POa is running properly but when I try to use -q 9 it says unrecognized argument. When I use -h, the -q argument is not available. When I do not include it C3POa runs without any issues.
Yeah, we took out that option since ONT qscores are mostly nonsensical