ncov icon indicating copy to clipboard operation
ncov copied to clipboard

Fix annotate metadata with numeric strains

Open huddlej opened this issue 2 years ago • 1 comments

Description of proposed changes

Fix annotate metadata with numeric strains by setting the dtype of the "strain" column in the sequence index to "string".

Related issue(s)

Fixes #948

Testing

  • [x] Added functional test for issue, confirmed that test failed, and confirmed that fix in c49a5a6 makes the test pass
  • [x] Tested by CI

huddlej avatar May 25 '22 20:05 huddlej

CI fails on the "push" because this PR branches from the workflow with the old Nextalign/Nextclade commands, but the CI runs with the latest Docker image using new Nextalign/Nextclade:

python3 scripts/sanitize_sequences.py             --sequences data/example_sequences.fasta.gz             --strip-prefixes hCoV-19/ SARS-CoV-2/             --output /dev/stdout 2> logs/sanitize_sequences_gisaid.txt             | nextalign             --jobs=2             --reference defaults/reference_seq.fasta             --genemap defaults/annotation.gff             --genes ORF1a,ORF1b,S,ORF3a,E,M,ORF6,ORF7a,ORF7b,ORF8,N,ORF9b             --sequences /dev/stdin             --output-dir results/translations             --output-basename seqs_gisaid             --output-fasta results/aligned_gisaid.fasta             --output-insertions results/insertions_gisaid.tsv > logs/align_gisaid.txt 2>&1;
        xz -2 -T 2 results/aligned_gisaid.fasta;
        xz -2 -T 2 results/translations/seqs_gisaid*.fasta

CI passes on the "pull" which runs the results we'd see after merging this PR into master where the alignment commands match the Docker environment.

python3 scripts/sanitize_sequences.py             --sequences data/example_sequences.fasta.gz             --strip-prefixes hCoV-19/ SARS-CoV-2/             --output /dev/stdout 2> logs/sanitize_sequences_gisaid.txt             | nextalign run             --jobs=2             --reference defaults/reference_seq.fasta             --genemap defaults/annotation.gff             --output-translations results/translations/seqs_gisaid.gene.{gene}.fasta             --output-fasta results/aligned_gisaid.fasta             --output-insertions results/insertions_gisaid.tsv > logs/align_gisaid.txt 2>&1;
        xz -2 -T 2 results/aligned_gisaid.fasta;
        xz -2 -T 2 results/translations/seqs_gisaid.gene.*.fasta

Given that the latter passes and is the check that matters, we should be safe to merge.

huddlej avatar Aug 04 '22 22:08 huddlej