g2p-seq2seq Reg: Error during seq2seq model training.

Hi all, while training the model I was getting the following error. I have followed previous blogs but I couldn't solve the issue. I could see my vocabulary is in ASCII format. I am not sure why I am getting this error. Please help me out how to solve this issue. Tensorflow version: 1.3.0

Traceback (most recent call last): File "/usr/local/bin/g2p-seq2seq", line 11, in load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')() File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 77, in main File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 198, in create_train_model File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 170, in __prepare_model File "build/bdist.linux-x86_64/egg/g2p_seq2seq/seq2seq_model.py", line 178, in init File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1195, in model_with_buckets softmax_loss_function=softmax_loss_function)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1110, in sequence_loss softmax_loss_function=softmax_loss_function)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1067, in sequence_loss_by_example crossent = softmax_loss_function(target, logit) File "build/bdist.linux-x86_64/egg/g2p_seq2seq/seq2seq_model.py", line 117, in sampled_loss File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 1191, in sampled_softmax_loss name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 947, in _compute_sampled_logits range_max=num_classes) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/candidate_sampling_ops.py", line 134, in log_uniform_candidate_sampler seed2=seed2, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_candidate_sampling_ops.py", line 357, in _log_uniform_candidate_sampler name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2397, in create_op set_shapes_for_outputs(ret) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1757, in set_shapes_for_outputs shapes = shape_func(op) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1707, in call_with_requiring return call_cpp_shape_fn(op, require_shape_fn=True) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn debug_python_shape_fn, require_shape_fn) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl raise ValueError(err.message) "ValueError: Shape must be rank 2 but is rank 1 for 'model_with_buckets/sequence_loss/sequence_loss_by_example/sampled_softmax_loss/LogUniformCandidateSampler' (op: 'LogUniformCandidateSampler') with input shapes: [?]"

Jun 13 '18 15:06 ellurunaresh

Hello, @ellurunaresh Please, clone the latest version of g2p-seq2seq (6.2.0a0). Also, it is required tensorflow=>1.5.0

Jun 13 '18 15:06 nurtas-m

Actually I couldn't update tensorflow in my system. Can I solve this problem without upgradation.

On Wed, 13 Jun 2018, 9:25 pm nurtas-m, [email protected] wrote:

Hello, @ellurunaresh https://github.com/ellurunaresh Please, clone the latest version of g2p-seq2seq (6.2.0a0). Also, it is required tensorflow=>1.5.0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmusphinx/g2p-seq2seq/issues/133#issuecomment-396990317, or mute the thread https://github.com/notifications/unsubscribe-auth/AJuFy3BDd6L_I1UFtTTFUXvWkfvW7PXPks5t8TYOgaJpZM4UmYpC .

Jun 13 '18 16:06 ellurunaresh

In that case, can you, please, install tensorflow=1.5.0 only for your user (with "--user" flag: pip install tensorflow==1.5.0 --user) ?

Jun 14 '18 14:06 nurtas-m

OK sure. Thanks 😊

On Thu, 14 Jun 2018, 7:36 pm nurtas-m, [email protected] wrote:

In that case, can you, please, install tensorflow=1.5.0 only for your user (with "--user" flag: pip install tensorflow==1.5.0 --user) ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmusphinx/g2p-seq2seq/issues/133#issuecomment-397308253, or mute the thread https://github.com/notifications/unsubscribe-auth/AJuFyzbZkqGd0A5t852qOoOKtuIldLAaks5t8m3ZgaJpZM4UmYpC .

Jun 14 '18 14:06 ellurunaresh

Hi, I am training the model with characters to word sequence using g2p approach. I am using large vocabulary size for this experiment. The new words have been added during test time and these entries do not exist in vocab.phoneme and I got "UNK" for unknown words.

How to handle "_UNK" during decoding. Is there any option to set the parameter so that it could take any nearest string?
During training can I generate "embeddings" for all unknown words?

Please help me out how to proceed further.

Jun 17 '18 16:06 ellurunaresh

Please let me know how to handle this issue?

Jun 18 '18 18:06 ellurunaresh

If anybody knows the solution please share it.

Jun 21 '18 12:06 ellurunaresh

Hello, @ellurunaresh

How to handle "_UNK" during decoding. Is there any option to set the parameter so that it could take any nearest string?

If you work with the problem with words boundary detection, as I had mention in issue #126, you don't need to consider any decoded symbols except "SPACE" symbol. The only information you have to utilize is the position of "SPACE" symbol. For example, you feed to the program following input sequence for decoding: '> goodafternoon

And, let's say, you receive following decoded sequence with "UNK" symbols: decodes = ["g", "o", "o", "UNK", "SPACE", "a", "v", "t", "UNK", "r", "n", "o", "e", "n"]

You, should take just "SPACE" symbols positions in decoded symbols: space_positions = [sym_pos for sym_pos, sym in enumerate(decodes) if sym == 'SPACE']

In the above example, "SPACE" symbol in decodes occurs on 4th position: print(space_positions) [4]

So, you should build output sequence from input sequence (not decoded sequence with "UNK" and other decoded symbols). And, just add white-space character in the positions where "SPACE" character found previously: output_str = "" for pos, sym in enumerate(inputs): ....if pos in space_positions: ........output_str += " " ....output_str += sym print("Input:{}".format("".join(inputs))) print("Output:{}".format(output_str))

During training can I generate "embeddings" for all unknown words?

Generation and utilizing embeddings outside of tensor2tensor is problematic due to applying not only tokens but also sub-tokens for building vocabularies: https://github.com/tensorflow/tensor2tensor/issues/173

Jun 21 '18 15:06 nurtas-m

g2p-seq2seq g2p-seq2seq copied to clipboard

Reg: Error during seq2seq model training.

g2p-seq2seq
g2p-seq2seq copied to clipboard