Youtube-8M-WILLOW icon indicating copy to clipboard operation
Youtube-8M-WILLOW copied to clipboard

Issues when test video/frame feature

Open feiyun1265 opened this issue 7 years ago • 33 comments

Hi, @antoine77340. I have download youtube-8m dataset. Then, i use video/frame test folder test your pretrained model. But i occur a error when testing, error information as follows: INFO:tensorflow:number of input files: 4096 INFO:tensorflow:loading meta-graph: pretrainedmodel/model.ckpt-310001.meta INFO:tensorflow:restoring variables from pretrainedmodel/model.ckpt-310001 INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, ../YT8M/youtube-8m/features/validatelN.tfrecord [[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]

Caused by op u'train_input/ReaderReadV2_1', defined at: File "inference.py", line 203, in app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "inference.py", line 199, in main FLAGS.output_file, FLAGS.batch_size, FLAGS.top_k) File "inference.py", line 128, in inference saver = tf.train.import_meta_graph(meta_graph_location, clear_devices=True) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1577, in import_meta_graph **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/meta_graph.py", line 498, in import_scoped_meta_graph producer_op_list=producer_op_list) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 287, in import_graph_def op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2395, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1264, in init self._traceback = _extract_stack()

NotFoundError (see above for traceback): ../YT8M/youtube-8m/features/validatelN.tfrecord [[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]

In addition, i use the command begin testing as follows: python inference.py --output_file=test_video_v1.csv --input_data_pattern="video_test/test*.tfrecord" --model=NetVLADModelLF --train_dir=pretrainedmodel --frame_features=false --batch_size=1024 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=1024 --moe_l2=1e-6 --iterations=300 --learning_rate_decay=0.8 --netvlad_relu=False --gating=True --moe_prob_gating=True --run_once=True --top_k=50

Looking forward to your reply, thank you!

feiyun1265 avatar Dec 13 '17 07:12 feiyun1265

Hi could you please try to delete the txt file (in the pretrainedmodel folder): graph.pbtxt please ? thank you

antoine77340 avatar Dec 13 '17 07:12 antoine77340

Hi, @antoine77340. I have delete graph.pbtxt, but the problem is still. INFO:tensorflow:number of input files: 4096 INFO:tensorflow:loading meta-graph: pretrainedmodel/model.ckpt-310001.meta INFO:tensorflow:restoring variables from pretrainedmodel/model.ckpt-310001 INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, ../YT8M/youtube-8m/features/validatevr.tfrecord [[Node: train_input/ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2, train_input/input_producer)]]

feiyun1265 avatar Dec 13 '17 08:12 feiyun1265

Hmm very strange, I don't understand why it tries to look at ../YT8M/youtube-8m/features/ folder. (This is where I store all my tfrecord files). I don't think this is the source of error, but can you also try deleting all events.out.* files ? What about if you put all validation and train tfrecord files in ../YT8M/youtube-8m/features/ ?

antoine77340 avatar Dec 13 '17 08:12 antoine77340

Hi, @antoine77340. First i delete all events.out.* files, then put test tfrecord files to ../YT8M/youtube-8m/features/. But occur error as follows: NotFoundError (see above for traceback): ../YT8M/youtube-8m/features/trainMr.tfrecord [[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]

feiyun1265 avatar Dec 13 '17 08:12 feiyun1265

@feiyun1265 @antoine77340 I have also met this question and solve it. I'm working on it , and now I have finished a complete function with a video as input and output a label . I could share it , but there are still some difficult problems, I think the result I got is not absolutely right, It's strange and I don't know why did it happended. ` import os import numpy import tensorflow as tf import csv

class Vinfer(): def init(self,model_path='public/'): self.train_dir = model_path self.batch_size = 1024 self.top_k = 5 self.check_point = -1

    self.vocabulary = self.load_vocabulary()
    self.sess = self.load_model()

def load_vocabulary(self):
    vc = []
    csv_reader = csv.reader(open(self.train_dir+'/vocabulary.csv'))
    for row in csv_reader:
        single = {}
        single['Name'] = row[3]
        single['V1'] = row[5]
        single['V2'] = row[6]
        single['V3'] = row[7]
        vc.append(single)
    vc = vc[1:]
    return vc

def load_model(self):
    tf_config = tf.ConfigProto()
    tf_config.gpu_options.allow_growth = True
    sess = tf.Session(config=tf_config)
    latest_checkpoint = tf.train.latest_checkpoint(self.train_dir)
    if latest_checkpoint is None:
      raise Exception("unable to find a checkpoint at location: %s" % self.train_dir)
    else:
      if self.check_point < 0:
        meta_graph_location = latest_checkpoint + ".meta"
      else:
        meta_graph_location = self.train_dir + "/model.ckpt-" + str(self.check_point) + ".meta"
        latest_checkpoint = self.train_dir + "/model.ckpt-" + str(self.check_point)
    saver = tf.train.import_meta_graph(meta_graph_location, clear_devices=True)
    saver.restore(sess, latest_checkpoint)
    self.input_tensor = tf.get_collection("input_batch_raw")[0]
    self.num_frames_tensor = tf.get_collection("num_frames")[0]
    self.predictions_tensor = tf.get_collection("predictions")[0]
    def set_up_init_ops(variables):
        if "train_input" in variable.name:
          init_op_list.append(tf.assign(variable, 1))
          variables.remove(variable)
      init_op_list.append(tf.variables_initializer(variables))
      return init_op_list

    sess.run(set_up_init_ops(tf.get_collection_ref(tf.GraphKeys.LOCAL_VARIABLES)))
    return sess

def format_res(self,predictions):
    batch_size = len(predictions)
    res = []
    for video_index in range(batch_size):
        top_indices = numpy.argpartition(predictions[video_index], -self.top_k)[-self.top_k:]
        line = [[self.vocabulary[class_index], predictions[video_index][class_index]] for class_index in top_indices]
        line = sorted(line, key=lambda p: -p[1])
        res.append(line)
    return res[0]

def inference(self,video_batch_val,num_frames_batch_val):
    predictions_val, = self.sess.run([self.predictions_tensor],{self.input_tensor: video_batch_val, self.num_frames_tensor: num_frames_batch_val})

    return self.format_res(predictions_val)

`

wincle avatar Dec 13 '17 09:12 wincle

@feiyun1265: I mean you should try to move the Validation AND training tfrecord in this directory (not the test tfrecord). Could you please try that. I am sorry for all of the problems I am actually not really a Tensorflow expert :(.

antoine77340 avatar Dec 13 '17 10:12 antoine77340

Thanks for analysing.@antoine77340 @wincle. I copy validation and training tfrecord to " ../YT8M/youtube-8m/features/". No error before, but occur error as follows: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1096, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (160, 1024) for Tensor u'train_input/shuffle_batch_join:1', which has shape '(?, 300, 1152)'

feiyun1265 avatar Dec 14 '17 08:12 feiyun1265

@feiyun1265 the shape is not matched ,just expand it to the recommend shape.

wincle avatar Dec 14 '17 09:12 wincle

@feiyun1265 And you should ues vggish to extract the audio feature and contact them together then send to the model

wincle avatar Dec 14 '17 09:12 wincle

I use youtube-8m dataset. Does the dataset have audio feature? @wincle. and how to use vggish, can you send me some specification link, thanks.

feiyun1265 avatar Dec 14 '17 09:12 feiyun1265

https://github.com/tensorflow/models/tree/master/research/audioset If you are using youtube-8m , you don't need it. the image feature is 1024 dimensions and audio feature is 128 dimensions , you should use them all.

wincle avatar Dec 14 '17 09:12 wincle

I will have a try, thank you very much. @wincle.

feiyun1265 avatar Dec 14 '17 09:12 feiyun1265

in model.ckpt-0.meta I got this pic。 _15175524282125

qingwa1990 avatar Feb 02 '18 06:02 qingwa1990

met the same problem

jiangzidong avatar Feb 09 '18 09:02 jiangzidong

I think its the bug of the youtube8m's inference code. It should 1) create the model like eval did or 2) directly use the saved model

jiangzidong avatar Feb 11 '18 02:02 jiangzidong

Dear @antoine77340, Thank you for the release of your code ! Very interesting !! I encountered the same issue as @feiyun1265 with tensorflow.python.framework.errors_impl.NotFoundError: ../YT8M/youtube-8m/features/trainGh.tfrecord; No such file or directory I do not want to download all the train and validation tf records as I use your released version of gatednetvladLF. I run the inference on my local machine and do not have 1TB memory to stock the frame-level features. Do you have an idea of how I can proceed?

SharoneDayan avatar Apr 26 '18 13:04 SharoneDayan

@SharoneDayan I think may be you should try to freeze the model.

wincle avatar Apr 27 '18 02:04 wincle

Dear All, I encountered the same issue : tensorflow.python.framework.errors_impl.NotFoundError: ../YT8M/youtube-8m/features/trainGh.tfrecord; No such file or directory

After tracing inference.py , I think the inference process stopped at a try-except-finally in "def inference".

More specifically, I think the exception happened at : coord.should_stop().

Does anyone know for sure whether the exception happened in coord.should_stop()?

BR, JimmyYS

speculaas avatar May 17 '18 09:05 speculaas

I also have the same problem when I try to execute the inference code, to be precise the output csv file only has 1 or 2 videos with labels then nothing, my error message follows

INFO:tensorflow:num examples processed: 2 elapsed seconds: 2.13 Traceback (most recent call last): File "inference.py", line 203, in app.run() File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "inference.py", line 199, in main FLAGS.output_file, FLAGS.batch_size, FLAGS.top_k) File "inference.py", line 172, in inference coord.join(threads) File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run enqueue_callable() File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1254, in _single_tensor_run results = self._call_tf_sessionrun(None, {}, fetch_list, [], None) File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: ../YT8M/youtube-8m/features/validatenp.tfrecord; No such file or directory [[Node: train_input/ReaderReadV2_4 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](train_input/TFRecordReaderV2_4, train_input/input_producer)]]

estathop avatar May 17 '18 12:05 estathop

the question is why is it trying to look for tfrecords in that particular folder "/YT8M/youtube-8m/features/"? I can't trace where this happens so I can delete it

estathop avatar May 17 '18 13:05 estathop

Dear All, I tried to remove code snippet related to training as the following printScreen.

And the result is that the error no longer happens but the programming did not terminate.

remove_tf-train screenshot from 2018-05-17 22-28-41 and the output file remained empty:

--output_file=test-gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe.csv

not even :

out_file.write("VideoId,LabelConfidencePairs\n")

is printed

Hope me or anyone can see why

BR, JimmyYS

speculaas avatar May 17 '18 14:05 speculaas

num_examples_processed initializes with 0. out_file.write("VideoId,LabelConfidencePairs\n") doesn't need to be printed, check only if the CSV file contains those strings in two columns. you need the threads to perform the evaluations and the while not statement to iterate through every batch given. I supposed that by eliminating the while not statement and coord.request_stop() you end up doing no calculations at all

maybe tf.graphkeys.local_Variables is doing the damage, I read in documentation that this is about objects that are local to each machine. Maybe this is where the paths are saved ?

what if instead tf.graphkeys.MODEL_VARIABLES ? what should that be doing ?

estathop avatar May 18 '18 08:05 estathop

Dear Estathop, Embarrassingly, I dont understand tensorflow enough, need to study more. For now, after adding some "out_file.flush()"s, I get classification result in specified output file. I tested this youtube : https://www.youtube.com/watch?v=3VUiz10w-aw And the result is :

$ cat prediction.2.csv VideoId,LabelConfidencePairs Hokuriku_E7_shinkansen.mkv,1 0.589722 62 0.283732 4 0.269821 11 0.041280 155 0.036597 10 0.020198 23 0.013597 121 0.012935 72 0.012433 12 0.009783 29 0.005635 17 0.004996 64 0.004927 50 0.004772 103 0.004416 20 0.004155 6 0.003413 633 0.003395 3764 0.002889 0 0.002232 1117 0.002042 28 0.002025 263 0.001938 152 0.001888 9 0.001886 68 0.001816 1348 0.001554 248 0.001477 82 0.001275 139 0.001250 330 0.001085 74 0.001053 337 0.000995 47 0.000988 92 0.000988 162 0.000959 2 0.000935 27 0.000923 70 0.000892 8 0.000811 25 0.000794 154 0.000785 286 0.000763 101 0.000738 69 0.000712 126 0.000711 148 0.000704 67 0.000664 234 0.000655 176 0.000643

where 1 0.589722 (Vehicle) 62 0.283732 (Train) 4 0.269821 (Car) 11 0.041280 (Motorsport) 155 0.036597 (Camera) 10 0.020198 (Animal) by looking up : https://research.google.com/youtube8m/csv/vocabulary.csv

However, even though my output csv now has result, exception still happened as before: INFO:tensorflow:restoring variables from /public/model.ckpt-310001 INFO:tensorflow:Restoring parameters from /public/model.ckpt-310001

2018-05-18 14:09:11.212298: W tensorflow/core/framework/allocator.cc:101] Allocation of 1140850688 exceeds 10% of system memory. INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, ../YT8M/youtube-8m/features/trainkj.tfrecord; No such file or directory [[Node: train_input/ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](train_input/TFRecordReaderV2, train_input/input_producer)]] 2018-05-18 14:09:13.653464: W tensorflow/core/framework/allocator.cc:101] Allocation of 1415577600 exceeds 10% of system memory. INFO:tensorflow:num examples processed: 1 elapsed seconds: 1.45 Traceback (most recent call last): File "inference_no_train.py", line 208, in app.run() local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: ../YT8M/youtube-8m/features/trainkj.tfrecord; No such file or directory [[Node: train_input/ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](train_input/TFRecordReaderV2, train_input/input_producer)]]

BR, JimmyYS

speculaas avatar May 18 '18 14:05 speculaas

No worries, I started using tensorflow recently also, I am not an expert.

The problem is in the "try:" block, I managed to retrieve the same tfrecord indefinitely and the program didn't crash it just continued to run forever. The problem is when trying to parse the second tfrecord file. The specific error's root is in the 2nd loop of the "while not coord.should_stop():" block and specifically video_id_batch_val, video_batch_val,num_frames_batch_val = sess.run([video_id_batch, video_batch, num_frames_batch]) but I don't understand why, I tried to print video_id_batch for example with the *.eval() inherited function as it is a tensor object but nothing happened, cmd crashed.

estathop avatar May 18 '18 15:05 estathop

Hi All,

I am also getting the similar error when I try to run inference.py over pretrained model released by Antoine. It says: tensorflow.python.framework.errors_impl.NotFoundError: ../YT8M/youtube-8m/features/train-C.tfrecord; No such file or directory

Has anyone solved this problem yet? Thanks in advance

puneetiitian avatar Jun 07 '18 06:06 puneetiitian

Hi all, I met the same problem. Beg for solutions, please.

suhmily avatar Jul 09 '18 13:07 suhmily

@antoine77340 thanks for your work when I use the pretrained model ,it occur: tensorflow.python.framework.errors_impl.NotFoundError: ../YT8M/youtube-8m/features/trainGh.tfrecord; No such file or directory As I know,in your workspace ,there is a file named /YT8M/youtube-8m/features/trainGh.tfrecord please upload the file,then the inference.py can run normally

caocao1989 avatar Aug 16 '18 07:08 caocao1989

Dears, I met the same problem when using the pretrained model. Is anyone try training your own model and no problem, mentioned above, occurred ? I am hesitating about training the model myself. Thanks in advance.

wenching33 avatar Aug 22 '18 09:08 wenching33

Dears, Just share some information. When I tried to train my own model using scripts given on this git (https://github.com/antoine77340/Youtube-8M-WILLOW), there were error messages like "InvalidArgumentError: Name: , Context feature 'video_id' is required but could not be found". I changed readers.py to solve the problem.

Then I use my own trained model to do inference successfully(XXX.csv is produced and prediction results are printed) I found a graph.pbtxt in the directory of my trained model, which also appears in the released pretrained model(if you download the pretrained willow model and extract contents you can find it). Inside the graph.pbtxt, there is a node named: " "train_input/input_producer/Const" That includes an attribute with key: "value". There I found string_vals like string_val: "/dataset/SP/Phil/frame/train/train0111.tfrecord" string_val: "/dataset/SP/Phil/frame/train/train0580.tfrecord" They are where I put my training data. That means the trained model is associated with the training data. So, I guess the released pretrained model can only be used successfully with those training data the author used. But it's not reasonable because those .tfrecord are so big. Maybe one should modify the graph so that the node "train_input/input_producer/Const" not containing training-data-related information. (I'm not sure if it is feasible) Or just release frozen .pb model.

wenching33 avatar Aug 24 '18 08:08 wenching33

Dears, Just share some information. When I tried to train my own model using scripts given on this git (https://github.com/antoine77340/Youtube-8M-WILLOW), there were error messages like "InvalidArgumentError: Name: , Context feature 'video_id' is required but could not be found". I changed readers.py to solve the problem.

Then I use my own trained model to do inference successfully(XXX.csv is produced and prediction results are printed) I found a graph.pbtxt in the directory of my trained model, which also appears in the released pretrained model(if you download the pretrained willow model and extract contents you can find it). Inside the graph.pbtxt, there is a node named: " "train_input/input_producer/Const" That includes an attribute with key: "value". There I found string_vals like string_val: "/dataset/SP/Phil/frame/train/train0111.tfrecord" string_val: "/dataset/SP/Phil/frame/train/train0580.tfrecord" They are where I put my training data. That means the trained model is associated with the training data. So, I guess the released pretrained model can only be used successfully with those training data the author used. But it's not reasonable because those .tfrecord are so big. Maybe one should modify the graph so that the node "train_input/input_producer/Const" not containing training-data-related information. (I'm not sure if it is feasible) Or just release frozen .pb model.

hi wenching33 how do you solve the problem "InvalidArgumentError: Name: , Context feature 'video_id' is required but could not be found". can you give some details? thanks in advance!

chendengshuai avatar Oct 25 '18 07:10 chendengshuai