iLearnPlus
iLearnPlus copied to clipboard
Show a warning if special fasta headers format is violated
In a large dataset of automatically downloaded sequences there can be names including "|" symbol. I concatenate class and train/test labels also automatically. So, when I try to analyze this file, there are uninformative error messages like:
- ValueError: could not convert string to float: 'P42577.2'
- ValueError: invalid literal for int() with base 10: '6LPD'
which are caused by incorrect fasta headers:
- P42577.2_sp|P42577.2|FRIS_LYMST|0|training
- 6LPD_pdb|6LPD|F|1|training
A simple check when importing the file could show a warning to the user.
Thank you for your suggestion. We will deal with this problem as soon as possible.
I added a helpful error message for ill-formed headers in PR #3.