seq2science
seq2science copied to clipboard
BUG: Indexing of Spur_5.0 assembly fails
Describe the bug
When making an index for assembly Spur_5.0 rule trackhub_index fails:
NC_001453.1 is not found in chromosome sizes file
To Reproduce I ran the atac-seq pipeline with many samples, but I guess this should be enough to reproduce:
sample assembly
GSM2546188 Spur_5.0
Expected behavior No failure! Is this a genomepy+annotation thingy? Or is it on our side?
This happens with immature genomes/annotations.
I suggest sticking with the older assembly for a while longer, but we could make the function "skip groups with errors" using flag -allErrors. What would you prefer?
What would that do? I guess in my case I would prefer it to ignore the annotation, since the alignment is more important than the annotation visualization in the ucsc trackhub
Since the rule is trackhub_index, I guess that any genes on that contig are unsearchable on UCSC. So perhaps its fine to ignore the error then...
I don't know if that is what you proposed. What would -allErrors do?
-allErrors - skip groups with errors rather than aborting.
Useful for getting infomation about as many errors as possible.
I think it means it just skips the lines in the annotation that cause an error.
Spur_5.0 has different contig names in the genome fasta and the annotation. NCBI will have to fix this, or you must manually change them :(
I would like to keep this one open (at least for now), until I have a solution for myself