Pangolin icon indicating copy to clipboard operation
Pangolin copied to clipboard

Reference genome mismatch due to lowercase sequence

Open gabrielle-y opened this issue 1 year ago • 1 comments

https://github.com/tkzeng/Pangolin/blob/5cf94b8db938c658391b4305cd7ce33297d44ff7/pangolin/pangolin.py#LL110C1-L111C1

Trying to run pangolin with the UCSC hg38 genome, which has some lowercase sequences. "[Line 64] WARNING, skipping variant: Mismatch between FASTA (ref base: g) and variant file (ref base: G)." error subsequently occurs as a result of the if statement at line 110. Attempts have been made to make seq uppercase using built in Python function however this has been unsuccessful in resolving the issue.

Would appreciate accommodations made to the script to support lowercase sequences - if resolved in the meantime, will update issue with the solution.

gabrielle-y avatar Jun 07 '23 06:06 gabrielle-y

Found the issue - we had to re-run the pip install to regenerate the updated pangolin.py file. Appending a .upper() to line 103 overcame the error. Have not tested downstream implications.

gabrielle-y avatar Jun 07 '23 07:06 gabrielle-y