LipNet icon indicating copy to clipboard operation
LipNet copied to clipboard

Need help prediction on an unseen video

Open apoorvpatne10 opened this issue 5 years ago • 3 comments

When we pass a new video to the model for prediction with pretrained weights, I guess it comes down to this part of the code:

./predict evaluation/models/overlapped-weights368.h5 bbaf1n.mpg 


def predict(self, input_batch):
        return self.test_function([input_batch, 0])[0]  # the first 0 indicates test

@property
def test_function(self):
    # captures output of softmax so we can decode the output during visualization
    return K.function([self.input_data, K.learning_phase()], [self.y_pred, K.learning_phase()])

How is this code working? I'm stuck here. Here's the traceback

Traceback (most recent call last):
  File "predict.py", line 65, in <module>
    video, result = predict(sys.argv[1], sys.argv[2])
  File "predict.py", line 58, in predict
    y_pred         = lipnet.predict(X_data)
  File "/home/apoorv/Work/TE_seminar/LipNet_Module/evaluation/model2.py", line 74, in predict
    return self.test_function([input_batch, 0])[0]  # the first 0 indicates test
  File "/home/apoorv/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/apoorv/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2671, in _call
session)
  File "/home/apoorv/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2623, in _make_callable
    callable_fn = session._make_callable_from_options(callable_opts)
  File "/home/apoorv/anaconda3/lib/python3.6/site packages/tensorflow/python/client/session.py", line 1471, in _make_callable_from_options
    return BaseSession._Callable(self, callable_options)
  File "/home/apoorv/anaconda3/lib/python3.6/site packages/tensorflow/python/client/session.py", line 1425, in __init__
    session._session, options_ptr, status)
  File "/home/apoorv/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: batc1/keras_learning_phase:0 is both fed and fetched.
Exception ignored in: <bound method BaseSession._Callable.__del__ of <tensorflow.python.client.session.BaseSession._Callable object at 0x7fb91090bfd0>>
Traceback (most recent call last):
  File "/home/apoorv/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1455, in __del__
    self._session._session, self._handle, status)
  File "/home/apoorv/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: No such callable handle: 549825888

EDIT : I created a new virtual environment and manually installed the dependencies mentioned in the setup.py. Although this worked, I wonder what all changes I must make to get it working with the current keras/tensorflow versions. But for now, it works.

apoorvpatne10 avatar Apr 18 '19 12:04 apoorvpatne10

@apoorvpatne10 it is clear that no tensor could be input and output at the same time. So, this is not a well-designed code in property test_function of the LipNet class. It is possible that tf 1.0.1 allowed such strange code, but I suggest to rewrite it like this:

    @property
    def test_function(self):
        # captures output of softmax so we can decode the output during visualization
        return K.function([self.input_data, K.learning_phase()], [self.y_pred])

I need to check but I think it will fix the problem.

RomanSteinberg avatar Jul 06 '20 14:07 RomanSteinberg

Ok. Some comments on this code.

  1. Nowadays Keras allows you not to compile a model if you need to use only predict.
  2. No need to use test_function property from original code. One can use model.predict.
  3. This code takes the whole video. I mean it parses frames from video file and all of them are provided to model. It is very bad because it exhausts your VRAM. The cure is to analyze only short videos.

After those changes I run this code using tf v1.15.0 with tf.keras api (no need to use original Keras library). Also, I need to mention that I didn't train, I just tried to use trained weights provided. My conclusion: it is reading lips incorrectly. First of all, it outputs some text when person not speaking on the video. Secondly, it outputs irrelevant text for English phrases. Yes, my pronunciation is not academic, but no word was guessed by the model on my attempts.

RomanSteinberg avatar Jul 08 '20 13:07 RomanSteinberg

The pre-trained model is invalid for the video recorded by myself.

lssily avatar Feb 25 '21 07:02 lssily