biovec icon indicating copy to clipboard operation
biovec copied to clipboard

raise Exception("Model has never trained this n-gram: " + ngram) Exception: Model has never trained this n-gram: WNA

Open devhimd19 opened this issue 3 years ago • 3 comments

Screenshot from 2021-08-27 15-55-29

devhimd19 avatar Aug 27 '21 10:08 devhimd19

Thank you for your report! The error means n-gram "WNA" is not trained because the corpus(uniprot trained one) does not contain such sequence, so you have to make your own corpus and train with it by yourself.

kyu999 avatar Sep 03 '21 02:09 kyu999

The corpus has the WNA. Can you please see the attached code and the input file.
output1.txt window_13re.txt biovec5.txt Screenshot from 2021-09-03 11-02-12

I am getting the output but it is still showing the error

devhimd19 avatar Sep 03 '21 05:09 devhimd19

Hi, @kyu999

I am facing the exact same error on my end too, but for the n-gram "KQE" instead.

Here's my code snippet -

pv = ProtVec('INPUT.FASTA', corpus_fname='OUTPUT.TXT', n=3) pv["QAT"] sequences = list(df[c]) (df[c] contains the AA sequence from which INPUT.FASTA was constructed) embeddings = [] for i in sequences: embed = pv.to_vecs(i) <- Error occurs here embeddings.append(embed)

Full code block, if it helps -

for d in data: df = pd.read_csv(d) dN = d[:-4] for c in cols: count = 1 with open('sequences_{a}_{b}.fasta'.format(a = c, b = dN), 'w') as f: for i in range(len(df)): print('>' + str(count) + '\n', df[c][i], file = f) count = count + 1 pv = ProtVec('sequences_{a}_{b}.fasta'.format(a = c, b = dN), corpus_fname='output_{a}_{b}.txt'.format(a = c, b = dN), n=3) pv["QAT"] sequences = list(df[c]) embeddings = [] for i in sequences: embed = pv.to_vecs(i) embeddings.append(embed) embedding = np.asarray(embeddings) all_embeddings = np.reshape(embedding, newshape=(embedding.shape[0], 300)) dF = pd.DataFrame(all_embeddings, columns = colN, dtype = object) dF['modification'] = df['modifications'] dF.to_csv('dataset-{a}_{b}.model'.format(a = c, b = dN)) pv.save('sequences_{a}_{b}.model'.format(a = c, b = dN))

(Idk why, but I can't seem to get this code block to indent properly.)

Please help me get past this error.

AliASafdari avatar Aug 08 '22 13:08 AliASafdari