iLearnPlus icon indicating copy to clipboard operation
iLearnPlus copied to clipboard

Show a warning if special fasta headers format is violated

Open kim-fehl opened this issue 3 years ago • 2 comments

In a large dataset of automatically downloaded sequences there can be names including "|" symbol. I concatenate class and train/test labels also automatically. So, when I try to analyze this file, there are uninformative error messages like:

  • ValueError: could not convert string to float: 'P42577.2'
  • ValueError: invalid literal for int() with base 10: '6LPD'

which are caused by incorrect fasta headers:

  • P42577.2_sp|P42577.2|FRIS_LYMST|0|training
  • 6LPD_pdb|6LPD|F|1|training

A simple check when importing the file could show a warning to the user.

kim-fehl avatar Apr 25 '21 20:04 kim-fehl

Thank you for your suggestion. We will deal with this problem as soon as possible.

Superzchen avatar Apr 26 '21 01:04 Superzchen

I added a helpful error message for ill-formed headers in PR #3.

li6in9muyou avatar Nov 27 '22 15:11 li6in9muyou