augur icon indicating copy to clipboard operation
augur copied to clipboard

augur clade only partially assigns clade information

Open cimendes opened this issue 2 years ago • 4 comments

Current Behavior

When running augur clade command the JSON file produced only has a partial list of assigned clades, with the remaining showing as "unassigned". When using the --reference option all branches are set to "unassigned"

Expected behavior

All branches should be correctly assigned with the clade information

How to reproduce

I'm using the following docker container: quay.io/biocontainers/augur:22.0.2--pyhdfd78af_0

With the following command: augur clades --tree kilifi_H3N2_new_docker_timetree.nwk --mutations kilifi_H3N2_new_docker_nt_muts.json kilifi_H3N2_new_docker_aa_muts.json --clades clades_h3n2_ha.tsv --output-node-data test_clades.json

Here are all the input and output files: augur_clade_input_output.zip

with the test_clades.json having the following content:

{
  "branches": {
    "NODE_0000006": {
      "labels": {
        "clade": "3C.2a"
      }
    },
    "SRR11445940_A_HA_H3": {
      "labels": {
        "clade": "3C.2a1"
      }
    }
  },
  "generated_by": {
    "program": "augur",
    "version": "22.0.2"
  },
  "nodes": {
    "100734_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "100954_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "109275_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "109292_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "109342_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "109562_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "109630_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "109974_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "110108_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "115485_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "115722_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "115833_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "115863_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "116143_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "116165_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "116225_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "116281_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "116354_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "116389_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "124408_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "124728_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "133124_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "133619_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "134526_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "134927_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "135010_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "135156_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "135379_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "135553_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "135676_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "92804_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "93547_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "94414_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "99056_A_HA_H3": {
      "clade_membership": "unassigned"
    },
    "NODE_0000000": {
      "clade_membership": "unassigned"
    },
    "NODE_0000002": {
      "clade_membership": "unassigned"
    },
    "NODE_0000003": {
      "clade_membership": "unassigned"
    },
    "NODE_0000005": {
      "clade_membership": "unassigned"
    },
    "NODE_0000006": {
      "clade_membership": "3C.2a"
    },
    "NODE_0000007": {
      "clade_membership": "3C.2a"
    },
    "NODE_0000008": {
      "clade_membership": "3C.2a"
    },
    "NODE_0000010": {
      "clade_membership": "3C.2a"
    },
    "NODE_0000011": {
      "clade_membership": "3C.2a"
    },
    "NODE_0000012": {
      "clade_membership": "3C.2a"
    },
    "NODE_0000013": {
      "clade_membership": "3C.2a"
    },
    "NODE_0000016": {
      "clade_membership": "3C.2a"
    },
    "NODE_0000017": {
      "clade_membership": "3C.2a"
    },
    "NODE_0000018": {
      "clade_membership": "unassigned"
    },
    "NODE_0000019": {
      "clade_membership": "unassigned"
    },
    "NODE_0000020": {
      "clade_membership": "unassigned"
    },
    "NODE_0000021": {
      "clade_membership": "unassigned"
    },
    "NODE_0000023": {
      "clade_membership": "unassigned"
    },
    "NODE_0000025": {
      "clade_membership": "unassigned"
    },
    "NODE_0000028": {
      "clade_membership": "unassigned"
    },
    "NODE_0000029": {
      "clade_membership": "unassigned"
    },
    "NODE_0000030": {
      "clade_membership": "unassigned"
    },
    "NODE_0000032": {
      "clade_membership": "unassigned"
    },
    "NODE_0000033": {
      "clade_membership": "unassigned"
    },
    "NODE_0000034": {
      "clade_membership": "unassigned"
    },
    "NODE_0000035": {
      "clade_membership": "unassigned"
    },
    "SRR11445892_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "SRR11445940_A_HA_H3": {
      "clade_membership": "3C.2a1"
    },
    "SRR11445941_A_HA_H3": {
      "clade_membership": "3C.2a"
    },
    "SRR13443360_A_HA_H3": {
      "clade_membership": "unassigned"
    }
  }
}

Your environment: if running Nextstrain locally

  • Operating system:
  • Browser:
  • Version (e.g. auspice 2.7.0):

Additional context

Add any other context about the problem here.

cimendes avatar Jun 14 '23 11:06 cimendes

Hi @cimendes,

This is expected behavior of augur clades when the node does not have the amino acid and nucleotide mutations that match your clade definitions.

I suspect you need to update the coordinates within clades_h3n2_ha.tsv. Currently, it is an exact copy of the H3N2 clades.tsv from the seasonal-flu repo, which was created based on the seasonal-flu repo's reference.fasta and genemap.gff.

If you look at the seasonal-flu's genemap.gff, it has different start/end coordinates than the coordinates listed for your reference in reference_h3n2_ha.gb.

joverlee521 avatar Jun 14 '23 17:06 joverlee521

Also note that the --reference option is not a supported feature yet. You should have seen this warning when you tried to use this option.

Although it is unexpected that using the --reference option affected your output, that sounds like a bug that should be fixed!

joverlee521 avatar Jun 14 '23 17:06 joverlee521

Just coming back to this issue:

  1. The samples we have are older H3N2s (2009-2015), and are just for training purposes. We wanted a good study, with raw reads available and some metadata.
  2. Here is a sample HA sequence: 109342_HA.fasta.zip
  3. From an explanation by @corneliusroemer, these older sdequences should get the clade "unassigned", which is what happens when I use nextclade web version and with the reference "CY163680".
  4. However, when I use the reference "EPI1857216", I get a 3C clade for that sample, which should be incorrect as the original paper reports clade 7.
  5. Shouldn't both references give the same clade output, or in which cases should one be used over the other?

jrotieno avatar Jun 22 '23 08:06 jrotieno

Hi @jrotieno, the issue you are running into is slightly different. Nextclade uses a different algorithm for clade assignment that is separate from the augur clade command.

As noted in the Clade assignment section:

Nextclade assigns the clade of the nearest reference node found during the Phylogenetic placement step.

Since the two references use different reference trees, they could potentially assign different clades to the same sample.


in which cases should one be used over the other?

Others will definitely have more insight here, but older samples would require an older reference since they are aligned against the reference for mutation calling.

joverlee521 avatar Jun 26 '23 22:06 joverlee521