C3POa Number of repeats used for consensus

I have generated consensus sequences for different datasets using C3POa. I am trying to do some stats by stablishing a correlation between the number of subreads and the accuracy of the consensus. When I am splitting the output file based on the information present in the header of each consensus sequence in the C3POa output, I have noticed that there is a jump from "1" to "3" without any sequences with "2" in all my output files. I have checked my input file and I have data that should fall into the "2" category. I am not sure why this is happening or If I am misunderstanding the output file. Thanks! for your assistance.

Mar 31 '21 10:03 OscarT32

What version of C3POa are you using? If you're running something older, I suggest updating to the latest version (v2.2.2). I haven't seen this come up in my test dataset. This is what I see when I plot out the accuracy per coverage bin: 2repswarm

I think what's probably happening is there's a bug in the consensus script that's used for pairwise consensus calling. As far as I know, there shouldn't be any problems with it in the most updated version. If you're on the latest C3POa version and you're still not seeing any reads with a coverage of 2, add .get() to the apply_async call on line 247. This will disable threading for the consensus calling and it will actually show you the errors.

Mar 31 '21 18:03 rvolden

Thanks for your answer. I am not using the latest version of C3POa. I was trying to install the latest version but it seems that I have an issue installing "pyabpoa". When I use any of the two commands that you indicate to install the different packages I get the following warning (I am sorry if its something simple, I am fairly new to this. I am using Ubuntu 18.04):

pyabpoa

Apr 01 '21 13:04 OscarT32

Do you have Cython installed? pip3 install --user Cython should do the trick. To cover all of your bases, try pip3 install --user --upgrade Cython setuptools wheel. Then you can try to install pyabpoa using pip. If that doesn't work, you can clone the abPOA repo and run make install_py

Apr 01 '21 23:04 rvolden

Thanks for the suggestions. Installation worked properly! I have started running some data that I ran on previous versions but I am having some issues.

When I use -q to filter the input file this warning is displayed:

When I remove -q, C3POa starts running but the ran finishes only after a few minutes (this is really fast compared to the previous version in which the same data set takes a few hours). When I checked the output, the "R2C2_consensus. fasta" file is really small with only a few sequences. The log file shows that only a few sequences are actually filtered compared to the total number of sequences (I have filtered sequences by size previously):

This is the command line I am using to run C3POa

Once again thank you for your assistance

Apr 03 '21 17:04 OscarT32

Can you follow the debug step seen here: https://github.com/rvolden/C3POa/issues/17#issuecomment-783469536

For some reason python multiprocessing doesn't like passing back errors, so it'll just die silently instead of complaining.

Apr 03 '21 17:04 rvolden

I followed the debug step. The following error was displayed:

Apr 03 '21 19:04 OscarT32

Seems to be a problem with pyabpoa, can you verify that your install is working correctly? It may have installed but it could still run into runtime errors

Apr 05 '21 20:04 rvolden

Thanks for your help. Indeed the problem was with pyabpoa install. Now C3POa is running properly but when I try to use -q 9 it says unrecognized argument. When I use -h, the -q argument is not available. When I do not include it C3POa runs without any issues.

Apr 16 '21 10:04 OscarT32

Yeah, we took out that option since ONT qscores are mostly nonsensical

Apr 16 '21 17:04 rvolden

C3POa C3POa copied to clipboard

Number of repeats used for consensus

C3POa
C3POa copied to clipboard