DeepImmuno icon indicating copy to clipboard operation
DeepImmuno copied to clipboard

The number of indices missing values for certain amino acids

Open koyurion opened this issue 3 years ago • 1 comments

Hi, thank you for your great job! But I have a question about the number of indices missing values for certain amino acids.

According to your description in the paper, there are 13 discarded indices, as below. However, when I check the source file of aaindex1, it seems that some missing values were filled by zero but not filled by "NA", such as ”H KOEP990101“, ”H ZASB820101“ and so on, which confuses me.


image

image

koyurion avatar Jun 05 '21 10:06 koyurion

Hello,

Thanks for the interests in DeepImmuno tool!

I am sorry for the confusion here. In the paper, I discarded 13 indices that have "NA" in the downloaded index files (https://www.genome.jp/ftp/db/community/aaindex/aaindex1), but kept the missing ones that filled by 0. To completely answer your question, I will perform a re-evaluation with all the indices with missing values removed and compare it with the original model.I will update the source file and clarify this in the README file to make sure the logic is rational here.

I hope it can at least clarify a bit of your burning confusion here, and I will keep you posted!

Plus, if you want to do some data cleaning on your end, the way I did is first use csplit linux command to partition the aaindex.txt file, so you have each index{3-digit-numbers}.txt file in the same folder. Then the main processing script is at: https://github.com/frankligy/DeepImmuno/blob/main/src/utils_get_afterpca.py#L203-L223

I hope this can help you in some way and save your some time. Let me know if you have any other questions!

Best, Frank

frankligy avatar Jun 06 '21 03:06 frankligy