hed-dlg-truncated icon indicating copy to clipboard operation
hed-dlg-truncated copied to clipboard

ValueError: GpuJoin: Wrong inputs for input 0 related to inputs 0.!

Open dhirajmadan1 opened this issue 7 years ago • 6 comments

I receive the following error when I run sample.py after a model trained with VHRED. Traceback (most recent call last): File "sample.py", line 114, in main() File "sample.py", line 101, in main verbose=args.verbose) File "/dccstor/dhimadan1/keshav/hed-dlg-truncated/search.py", line 55, in sample_apply samples, costs = sample_logic(sampler, joined_context, **kwargs) File "/dccstor/dhimadan1/keshav/hed-dlg-truncated/search.py", line 193, in sample encoder_states = self.compute_encoding(context[:, indx_update_hs], reversed_context[:, indx_update_hs], self.max_len) File "/u/dhimadan/anaconda2/envs/tf1.1/lib/python2.7/site-packages/theano/compile/function_module.py", line 898, in call storage_map=getattr(self.fn, 'storage_map', None)) File "/u/dhimadan/anaconda2/envs/tf1.1/lib/python2.7/site-packages/theano/gof/link.py", line 325, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/u/dhimadan/anaconda2/envs/tf1.1/lib/python2.7/site-packages/theano/compile/function_module.py", line 884, in call self.fn() if output_subset is None else
ValueError: GpuJoin: Wrong inputs for input 0 related to inputs 0.! Apply node that caused the error: GpuJoin(TensorConstant{2}, GpuSubtensor{int64:int64:int8}.0, GpuJoin.0) Toposort index: 280 Inputs types: [TensorType(int8, scalar), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D)] Inputs shapes: [(), (191, 10, 1000), (159, 10, 2000)] Inputs strides: [(), (10000, 1000, 1), (20000, 2000, 1)] Inputs values: [array(2, dtype=int8), 'not shown', 'not shown'] Outputs clients: [[GpuReshape{2}(GpuJoin.0, MakeVector{dtype='int64'}.0)]]

The error does not appear if I use simple HRED model (without latent variable sampling step). Kindly help

dhirajmadan1 avatar Aug 18 '17 06:08 dhirajmadan1

I have the same question with you, have you solved it? Please tell me! Thank you!

HITlilingzhi avatar Feb 05 '18 06:02 HITlilingzhi

I have found the problem... In the line 87 of search.py, they set "self.max_len = 160". If one of your test example's length is more than this default value, This error will appear. I set the number of self.max_len as the max length of test contexts, and this error didn't show again.

HITlilingzhi avatar Feb 05 '18 12:02 HITlilingzhi

I hit the same problem, here is the log:


Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
  File "pygpu/gpuarray.pyx", line 1502, in pygpu.gpuarray.pygpu_concatenate
  File "pygpu/gpuarray.pyx", line 427, in pygpu.gpuarray.array_concatenate
ValueError: b'Dimension mismatch. as[1]->dimensions[0] = 159, as[0]->dimensions[0] = 255'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bin/sample.py", line 114, in <module>
    main()
  File "bin/sample.py", line 92, in main
    verbose=args.verbose,
  File "/home/cgsdfc/deployment/Models/Dialogue/HRED-VHRED/serban/search.py", line 49, in sample_apply
    samples, costs = sample_logic(sampler, joined_context, **kwargs)
  File "/home/cgsdfc/deployment/Models/Dialogue/HRED-VHRED/serban/search.py", line 197, in sample
    self.max_len)
  File "/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py", line 917, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python3.6/dist-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
  File "pygpu/gpuarray.pyx", line 1502, in pygpu.gpuarray.pygpu_concatenate
  File "pygpu/gpuarray.pyx", line 427, in pygpu.gpuarray.array_concatenate
ValueError: b'Dimension mismatch. as[1]->dimensions[0] = 159, as[0]->dimensions[0] = 255'
Apply node that caused the error: GpuJoin(TensorConstant{2}, GpuSubtensor{int64:int64:int8}.0, GpuJoin.0)
Toposort index: 277
Inputs types: [TensorType(int8, scalar), GpuArrayType<None>(float32, 3D), GpuArrayType<None>(float32, 3D)]
Inputs shapes: [(), (255, 1, 1000), (159, 1, 500)]
Inputs strides: [(), (4000, 4000, 4), (2000, 2000, 4)]
Inputs values: [array(2, dtype=int8), 'not shown', 'not shown']
Outputs clients: [[GpuReshape{2}(GpuJoin.0, MakeVector{dtype='int64'}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "bin/sample.py", line 114, in <module>
    main()
  File "bin/sample.py", line 92, in main
    verbose=args.verbose,
  File "/home/cgsdfc/deployment/Models/Dialogue/HRED-VHRED/serban/search.py", line 49, in sample_apply
    samples, costs = sample_logic(sampler, joined_context, **kwargs)
  File "/home/cgsdfc/deployment/Models/Dialogue/HRED-VHRED/serban/search.py", line 112, in sample
    self.compile()
  File "/home/cgsdfc/deployment/Models/Dialogue/HRED-VHRED/serban/search.py", line 84, in compile
    self.compute_encoding = self.model.build_encoder_function()
  File "/home/cgsdfc/deployment/Models/Dialogue/HRED-VHRED/serban/dialog_encoder_decoder.py", line 1622, in build_encoder_function
    hs_and_h_future = T.concatenate([hs_to_condition_latent_variable_on, h_future], axis=2)
  File "bin/sample.py", line 114, in <module>
    main()
  File "bin/sample.py", line 92, in main
    verbose=args.verbose,
  File "/home/cgsdfc/deployment/Models/Dialogue/HRED-VHRED/serban/search.py", line 49, in sample_apply
    samples, costs = sample_logic(sampler, joined_context, **kwargs)
  File "/home/cgsdfc/deployment/Models/Dialogue/HRED-VHRED/serban/search.py", line 112, in sample
    self.compile()
  File "/home/cgsdfc/deployment/Models/Dialogue/HRED-VHRED/serban/search.py", line 84, in compile
    self.compute_encoding = self.model.build_encoder_function()
  File "/home/cgsdfc/deployment/Models/Dialogue/HRED-VHRED/serban/dialog_encoder_decoder.py", line 1622, in build_encoder_function
    hs_and_h_future = T.concatenate([hs_to_condition_latent_variable_on, h_future], axis=2)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

@HITlilingzhi your solution seems very promising, could you share more details (code snippet ?

cgsdfc avatar Apr 14 '19 15:04 cgsdfc

I think I've found a simple fix

        self.max_len = len(context) + 1

See the full context. The idea is setting self.max_len to the length of the context being decoded currently plus the magic number 1.

It works around the VHRED issue and the results on HRED are the same with or without this modification (I tested it). Thus I think it is reliable.

cgsdfc avatar Apr 15 '19 02:04 cgsdfc

I am sorry to announce that the fix proposed in my previous comment will cause the training of VHRED to break. Thus it is not a fix at all. The log of failure will be like:

tent_utterance_approx_posterior = 0.0278
2019-04-27 03:50:51,591: bin/train.py:175: INFO: Wl_std_outlatent_utterance_approx_posterior = 1.0024
2019-04-27 03:50:51,591: bin/train.py:175: INFO: bl_std_outlatent_utterance_approx_posterior = 0.0074
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
  File "pygpu/gpuarray.pyx", line 1502, in pygpu.gpuarray.pygpu_concatenate
  File "pygpu/gpuarray.pyx", line 427, in pygpu.gpuarray.array_concatenate
ValueError: b'Dimension mismatch. as[1]->dimensions[0] = 1, as[0]->dimensions[0] = 11'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bin/train.py", line 662, in <module>
    main(args)
  File "bin/train.py", line 176, in main
    samples, costs = random_sampler.sample([[]], n_samples=1, n_turns=3)
  File "/home/cgsdfc/deployment/Models/HRED-VHRED/serban/search.py", line 49, in sample_apply
    samples, costs = sample_logic(sampler, joined_context, **kwargs)
  File "/home/cgsdfc/deployment/Models/HRED-VHRED/serban/search.py", line 197, in sample
    context[:, indx_update_hs], reversed_context[:, indx_update_hs], self.max_len)
  File "/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py", line 917, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python3.6/dist-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
  File "pygpu/gpuarray.pyx", line 1502, in pygpu.gpuarray.pygpu_concatenate
  File "pygpu/gpuarray.pyx", line 427, in pygpu.gpuarray.array_concatenate
ValueError: b'Dimension mismatch. as[1]->dimensions[0] = 1, as[0]->dimensions[0] = 11'
Apply node that caused the error: GpuJoin(TensorConstant{2}, GpuSubtensor{int64:int64:int8}.0, GpuJoin.0)
Toposort index: 268
Inputs types: [TensorType(int8, scalar), GpuArrayType<None>(float32, 3D), GpuArrayType<None>(float32, 3D)]
Inputs shapes: [(), (11, 1, 1000), (1, 1, 2000)]
Inputs strides: [(), (4000, 4000, 4), (8000, 8000, 4)]
Inputs values: [array(2, dtype=int8), 'not shown', 'not shown']
Outputs clients: [[GpuReshape{2}(GpuJoin.0, MakeVector{dtype='int64'}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "bin/train.py", line 662, in <module>
    main(args)
  File "bin/train.py", line 176, in main
    samples, costs = random_sampler.sample([[]], n_samples=1, n_turns=3)
  File "/home/cgsdfc/deployment/Models/HRED-VHRED/serban/search.py", line 49, in sample_apply
    samples, costs = sample_logic(sampler, joined_context, **kwargs)
  File "/home/cgsdfc/deployment/Models/HRED-VHRED/serban/search.py", line 113, in sample
    self.compile()
  File "/home/cgsdfc/deployment/Models/HRED-VHRED/serban/search.py", line 84, in compile
    self.compute_encoding = self.model.build_encoder_function()
  File "/home/cgsdfc/deployment/Models/HRED-VHRED/serban/dialog_encoder_decoder.py", line 1622, in build_encoder_function
    hs_and_h_future = T.concatenate([hs_to_condition_latent_variable_on, h_future], axis=2)
  File "bin/train.py", line 662, in <module>
    main(args)
  File "bin/train.py", line 176, in main
    samples, costs = random_sampler.sample([[]], n_samples=1, n_turns=3)
  File "/home/cgsdfc/deployment/Models/HRED-VHRED/serban/search.py", line 49, in sample_apply
    samples, costs = sample_logic(sampler, joined_context, **kwargs)
  File "/home/cgsdfc/deployment/Models/HRED-VHRED/serban/search.py", line 113, in sample
    self.compile()
  File "/home/cgsdfc/deployment/Models/HRED-VHRED/serban/search.py", line 84, in compile
    self.compute_encoding = self.model.build_encoder_function()
  File "/home/cgsdfc/deployment/Models/HRED-VHRED/serban/dialog_encoder_decoder.py", line 1622, in build_encoder_function
    hs_and_h_future = T.concatenate([hs_to_condition_latent_variable_on, h_future], axis=2)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
root@ab1aa45d6434:/home/cgsdfc/deployment/Models/HRED-VHRED#
root@ab1aa45d6434:/home/cgsdfc/deployment/Models/HRED-VHRED#
root@ab1aa45d6434:/home/cgsdfc/deployment/Models/HRED-VHRED#

cgsdfc avatar Apr 27 '19 07:04 cgsdfc

Will someone post a fix to this? I will be much much appreciative!

cgsdfc avatar Apr 27 '19 08:04 cgsdfc