mobile-vod-bottleneck-lstm icon indicating copy to clipboard operation
mobile-vod-bottleneck-lstm copied to clipboard

How does the net learn temporal information since shuffle is True in Dataloader?

Open jszgz opened this issue 5 years ago • 5 comments

jszgz avatar Nov 29 '19 12:11 jszgz

Hi @jszgz ,I noticed this problem too. So I provided a script here to create a new sequence txt which contains images belong to different videos shuffled as video level. When you train the model, please set 'shuffle' as false. Unfortunately , after using the new txt, I still cannot get a efficient model with a higher mAP than the basenet. If you are interested, please try to train a efficient model.

#!/usr/bin/python3
"""Script for creating text file containing sequences of 10 frames of particular video. Here we neglect all the frames where 
there is no object in it as it was done in the official implementation in tensorflow.
Global Variables
----------------
dirs : containing list of all the training dataset folders
dirs_val : containing path to val folder of dataset
dirs_test : containing path to test folder of dataset

create a sequence list contains images belong to different videos shuffleed as video level
"""
import numpy as np
import logging
import pathlib
import xml.etree.ElementTree as ET
import cv2
import os

dirs = ['ILSVRC2015_VID_train_0000/',
		'ILSVRC2015_VID_train_0001/',
		'ILSVRC2015_VID_train_0002/',
		'ILSVRC2015_VID_train_0003/']
# Your path
dirs_val = ['../../../ILSVRC2015/Data/VID/val/']
dirs_test = ['../../../ILSVRC2015/Data/VID/test/']
dataset_path = '../../../ILSVRC2015/'


file_write_obj = open('train_VID_seqs_list_shuffle.txt','w')
seqs = []
for dir in dirs:
	seq = os.listdir(os.path.join(dataset_path,'Data/VID/train/',dir))
	for item in seq:
		seqs.append(os.path.join(dir, item))



#index_del = np.random.choice(len(seqs),size=int(len(seqs)*0.9),replace=False)
#seqs = np.delete(seqs,index_del)
np.random.shuffle(seqs)
#print(seqs[0],seqs[1])
for seq in seqs:
	seq_path = os.path.join(dataset_path,'Data/VID/train/',seq)
	relative_path = seq
	image_list = np.sort(os.listdir(seq_path))
	count = 0
	filtered_image_list = []
	for image in image_list:
		image_id = image.split('.')[0]
		anno_file = image_id + '.xml'
		anno_path = os.path.join(dataset_path,'Annotations/VID/train/',seq,anno_file)
		objects = ET.parse(anno_path).findall("object")
		num_objs = len(objects)
		if num_objs == 0: # discarding images without object
			continue
		else:
			count = count + 1
			filtered_image_list.append(relative_path+'/'+image_id)
	for i in range(0,int(count/10)):
		seqs = ''
		for j in range(0,10):
			seqs = seqs + filtered_image_list[10*i + j] + ','
		seqs = seqs[:-1]
		file_write_obj.writelines(seqs)
		file_write_obj.write('\n')
file_write_obj.close()
'''
file_write_obj = open('val_VID_seqs_list_small.txt','w')
seq_list = []
with open('val_VID_list.txt') as f:
	for line in f:
		seq_list.append(line.rstrip())
for i in range(0,int(len(seq_list)/10)):
	#image_path = seq_list[10*i].split('/')[0]
	#seqs = image_path+'/'+':'
	seqs = ''
	for j in range(0,10):
		seqs = seqs + seq_list[10*i + j] + ','
	seqs = seqs[:-1] 
	file_write_obj.writelines(seqs)
	file_write_obj.write('\n')
file_write_obj.close()
file_write_obj = open('test_VID_seqs_list_small.txt','w')
for dir in dirs_test:
	seqs = np.sort(os.listdir(dir))
	for seq in seqs:
		seq_path = os.path.join(dir,seq)
		image_list = np.sort(os.listdir(seq_path))
		for image in image_list:
			file_write_obj.writelines(seq+image)
			file_write_obj.write('\n')
file_write_obj.close()'''

Mindbooom avatar Dec 02 '19 01:12 Mindbooom

This is an improvement, but we need to take new random sequences at every epoch

petinhoss7 avatar May 12 '20 13:05 petinhoss7

I think we should use a new a sample strategy which can sample temporally adjacent frames in order at random timestamp. If the sequence is shuffled, how can the net know the order of the motion, or contrary,this lead to a Robust net?

jszgz avatar May 12 '20 13:05 jszgz

he did the right thing, but instead of using the batch size of 10 in the dataloader, I used 1 because we need 1 sequence of 10 frames, after that you need to modify the train function to convert the list sequence of images into tensors and that should work it out , I am doing the learning now and it seems to work fine for the moment.

petinhoss7 avatar May 12 '20 15:05 petinhoss7

we also have to keep shuffle = True so it chooses random sequences

petinhoss7 avatar May 12 '20 15:05 petinhoss7