seq2science icon indicating copy to clipboard operation
seq2science copied to clipboard

BUG: Indexing of Spur_5.0 assembly fails

Open Maarten-vd-Sande opened this issue 5 years ago • 7 comments

Describe the bug When making an index for assembly Spur_5.0 rule trackhub_index fails: NC_001453.1 is not found in chromosome sizes file

To Reproduce I ran the atac-seq pipeline with many samples, but I guess this should be enough to reproduce:

sample              assembly
GSM2546188   Spur_5.0

Expected behavior No failure! Is this a genomepy+annotation thingy? Or is it on our side?

Maarten-vd-Sande avatar Jul 23 '20 09:07 Maarten-vd-Sande

This happens with immature genomes/annotations.

I suggest sticking with the older assembly for a while longer, but we could make the function "skip groups with errors" using flag -allErrors. What would you prefer?

siebrenf avatar Jul 23 '20 10:07 siebrenf

What would that do? I guess in my case I would prefer it to ignore the annotation, since the alignment is more important than the annotation visualization in the ucsc trackhub

Maarten-vd-Sande avatar Jul 23 '20 10:07 Maarten-vd-Sande

Since the rule is trackhub_index, I guess that any genes on that contig are unsearchable on UCSC. So perhaps its fine to ignore the error then...

siebrenf avatar Jul 23 '20 10:07 siebrenf

I don't know if that is what you proposed. What would -allErrors do?

Maarten-vd-Sande avatar Jul 23 '20 10:07 Maarten-vd-Sande

-allErrors - skip groups with errors rather than aborting.
      Useful for getting infomation about as many errors as possible.

I think it means it just skips the lines in the annotation that cause an error.

siebrenf avatar Jul 23 '20 10:07 siebrenf

Spur_5.0 has different contig names in the genome fasta and the annotation. NCBI will have to fix this, or you must manually change them :(

siebrenf avatar Aug 03 '20 13:08 siebrenf

I would like to keep this one open (at least for now), until I have a solution for myself

Maarten-vd-Sande avatar Aug 03 '20 13:08 Maarten-vd-Sande