zUMIs icon indicating copy to clipboard operation
zUMIs copied to clipboard

Bugfix: corrects for errors in counting with special characters

Open TomKellyGenetics opened this issue 3 years ago • 2 comments

Tested with published SmartSeq3 data from Array Express. Compatible with latest versions for STAR (2.7.9a) and samtools (1.7). Samtools idxstats gives "*" special character as the column name for chromosome for unmapped reads, this fails silently as it is not permitted as a factor level and leads to issues counting reads/UMIs later on. These are removed and unmapped reads are not counted. This restores counting of UMI and internal reads for SmartSeq3.

FYI: the docker container for v2.9.2 is also out of date. It requires installing R >= 4.0, samtools, and STAR, as well as many missing dependencies.

TomKellyGenetics avatar Jun 23 '21 05:06 TomKellyGenetics

Hi, Sorry to say i haven't had time to look into this. I'm not aware of any issue with idxstats so I'll need to dive into that before merging the changes. If you have a more detailed description on what you found there it'd be appreciated.

As for the docker, it shouldn't require any dependencies/installation, the conda environment zUMIs brings from GitHub should work great within docker. But thanks for the code snippet to make the docker programmatically, that's really useful!

Best Christoph

cziegenhain avatar Jun 30 '21 04:06 cziegenhain

Sorry our server got shutdown for maintenance (while I was on leave) so I can't access the docker container where I tested this version. I'm trying to get it back up to test it again but it may take a while.

I wasn't meaning to add the Dockerfile to the PR but happy to share it. Basically these are the steps from the command history I had to run to get it to work the first time.

As a correction: I've checked and the Docker build installs samtools 1.7

TomKellyGenetics avatar Jun 30 '21 06:06 TomKellyGenetics