ariba icon indicating copy to clipboard operation
ariba copied to clipboard

Filter Bad MLST Profiles from PubMLST

Open Nilad opened this issue 2 years ago • 0 comments

Hi,

Recently, i notice that few Profile like ST-2724 in "Acinetobacter baumannii#1" schema or ST-2609 in "Haemophilus influenzae" schema contains "N" letter index instead of allele index.

So i got this error when i want to download theses schemas...

ariba pubmlstget "Haemophilus influenzae" mlst_hinfluenza_test
WARNING: spades not found in path. Looked for spades.py
Traceback (most recent call last):
  File "/usr/local/bin/ariba", line 312, in <module>
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/ariba/tasks/pubmlstget.py", line 11, in run
    preparer.run()
  File "/usr/local/lib/python3.8/dist-packages/ariba/pubmlst_ref_preparer.py", line 81, in run
    self.profile = mlst_profile.MlstProfile(profile_file, duplicate_warnings=True)
  File "/usr/local/lib/python3.8/dist-packages/ariba/mlst_profile.py", line 15, in __init__
    self._load_input_file()
  File "/usr/local/lib/python3.8/dist-packages/ariba/mlst_profile.py", line 29, in _load_input_file
    type_tuple = tuple(int(row[x]) for x in self.genes_list)
  File "/usr/local/lib/python3.8/dist-packages/ariba/mlst_profile.py", line 29, in <genexpr>
    type_tuple = tuple(int(row[x]) for x in self.genes_list)
ValueError: invalid literal for int() with base 10: 'N'

Maybe, it can possible to check if a no numeric value is present and remove the corresponding ST profile...

Thanks in advance.

Nilad avatar Aug 18 '22 10:08 Nilad