g2p-seq2seq
g2p-seq2seq copied to clipboard
Reg: Error during seq2seq model training.
Hi all, while training the model I was getting the following error. I have followed previous blogs but I couldn't solve the issue. I could see my vocabulary is in ASCII format. I am not sure why I am getting this error. Please help me out how to solve this issue. Tensorflow version: 1.3.0
Traceback (most recent call last):
File "/usr/local/bin/g2p-seq2seq", line 11, in
Hello, @ellurunaresh Please, clone the latest version of g2p-seq2seq (6.2.0a0). Also, it is required tensorflow=>1.5.0
Actually I couldn't update tensorflow in my system. Can I solve this problem without upgradation.
On Wed, 13 Jun 2018, 9:25 pm nurtas-m, [email protected] wrote:
Hello, @ellurunaresh https://github.com/ellurunaresh Please, clone the latest version of g2p-seq2seq (6.2.0a0). Also, it is required tensorflow=>1.5.0
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmusphinx/g2p-seq2seq/issues/133#issuecomment-396990317, or mute the thread https://github.com/notifications/unsubscribe-auth/AJuFy3BDd6L_I1UFtTTFUXvWkfvW7PXPks5t8TYOgaJpZM4UmYpC .
In that case, can you, please, install tensorflow=1.5.0 only for your user (with "--user" flag: pip install tensorflow==1.5.0 --user) ?
OK sure. Thanks 😊
On Thu, 14 Jun 2018, 7:36 pm nurtas-m, [email protected] wrote:
In that case, can you, please, install tensorflow=1.5.0 only for your user (with "--user" flag: pip install tensorflow==1.5.0 --user) ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmusphinx/g2p-seq2seq/issues/133#issuecomment-397308253, or mute the thread https://github.com/notifications/unsubscribe-auth/AJuFyzbZkqGd0A5t852qOoOKtuIldLAaks5t8m3ZgaJpZM4UmYpC .
Hi, I am training the model with characters to word sequence using g2p approach. I am using large vocabulary size for this experiment. The new words have been added during test time and these entries do not exist in vocab.phoneme and I got "UNK" for unknown words.
- How to handle "_UNK" during decoding. Is there any option to set the parameter so that it could take any nearest string?
- During training can I generate "embeddings" for all unknown words?
Please help me out how to proceed further.
Please let me know how to handle this issue?
If anybody knows the solution please share it.
Hello, @ellurunaresh
- How to handle "_UNK" during decoding. Is there any option to set the parameter so that it could take any nearest string?
- If you work with the problem with words boundary detection, as I had mention in issue #126, you don't need to consider any decoded symbols except "SPACE" symbol. The only information you have to utilize is the position of "SPACE" symbol. For example, you feed to the program following input sequence for decoding: '> goodafternoon
And, let's say, you receive following decoded sequence with "UNK" symbols: decodes = ["g", "o", "o", "UNK", "SPACE", "a", "v", "t", "UNK", "r", "n", "o", "e", "n"]
You, should take just "SPACE" symbols positions in decoded symbols:
space_positions = [sym_pos for sym_pos, sym in enumerate(decodes) if sym == 'SPACE']
In the above example, "SPACE" symbol in decodes occurs on 4th position:
print(space_positions)
[4]
So, you should build output sequence from input sequence (not decoded sequence with "UNK" and other decoded symbols). And, just add white-space character in the positions where "SPACE" character found previously:
output_str = ""
for pos, sym in enumerate(inputs):
....if pos in space_positions:
........output_str += " "
....output_str += sym
print("Input:{}".format("".join(inputs)))
print("Output:{}".format(output_str))
- During training can I generate "embeddings" for all unknown words?
Generation and utilizing embeddings outside of tensor2tensor is problematic due to applying not only tokens but also sub-tokens for building vocabularies: https://github.com/tensorflow/tensor2tensor/issues/173