RNN-for-Human-Activity-Recognition-using-2D-Pose-Input Is Inference dependent on a certain range of position for the keypoints coordinates?

trafficstars

Love this implementation! However, for inference on custom dataset, is the inference this LSTM model dependent on the position of the coordinate? Or is it dependent on how the previous coordinates relate to one another?

I'm not sure if I am clear about the questions above, but the problem Im facing is that when I run the model with my own data, it doesn't label the actions correctly (however, when I tested with the test data for sanity check, it labelled with around 98% accuracy) - so I'm wondering if the coordinates of the joints keypoints has to be around certain range as well?

Thank you!

Sep 24 '19 15:09 hwlee96

Sorry for the super late reply. For anyone else facing this problem, the data input has not been normalised and this is likely a cause of the issue you're describing. The models output is dependent on both the absolute position of the keypoints in the frame and the relative motion of each keypoint between frames, due to the nature of Recurrent Neural Networks.

Best practise would be to normalise the data inputs for position in the frame. This could be done per dimension using Batch Norm during training, or by normalising the entire dataset before training. I didn't do this for this implementation but it would definitely be needed if you wanted to use this on real world data.

Jun 01 '20 21:06 stuarteiffert

I normalized x_train and x_test with scikit.preprocessing.normalize using 'max' as norm parameter, each float value have 25 precision. The code is:

import numpy as np
import sys
from sklearn import preprocessing

np.set_printoptions(suppress=True, formatter={'float_kind':'{:f}'.format})

dataset = "database"

X_train_path = dataset + "\X_train.txt"
X_test_path = dataset + "\X_test.txt"

y_train_path = dataset + "\Y_train.txt"
y_test_path = dataset + "\Y_test.txt"

n_steps = 32

def load_X(X_path):
    file = open(X_path, 'r')
    X_ = np.array(
        [elem for elem in [
            row.split(',') for row in file
        ]], 
        dtype=np.float32
    )
    file.close()
    blocks = int(len(X_) / n_steps)
    
    X_ = np.array(np.split(X_,blocks))

    return X_

def normalize(X_):
    nsamples, nx, ny = X_.shape
    X_ = X_.reshape((nsamples,nx*ny))
    X_ = np.array(preprocessing.normalize(X_, norm='max', copy=False), dtype=np.float32)
    X_ = X_.reshape((nsamples, nx, ny))

    return X_

def write_to_txt(X_, filedir):
    file = open(filedir, "w")
    for row in X_:
        np.savetxt(file, row, fmt='%.25f')
    file.close()


trainX = load_X(X_train_path)
trainX = normalize(trainX)
write_to_txt(trainX, "database_normalized/X_train.txt")

testX = load_X(X_test_path)
testX = normalize(testX)
write_to_txt(testX, "database_normalized/X_test.txt")

Then i changed load_X function in main code to split by space instead of comma because the new normalized data had spaces between values since it is a numpy array.

Then i trained the model but the accuracy was around "0.80". Then i tried with l2 normalization and the results were worse, which is 0.60 accuracy. What causes this? How should i normalize the data?

The main code is in tf 2.0:

import matplotlib
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import save_model
from sklearn.model_selection import train_test_split
from sklearn import metrics
from bahadir.LSTM_RNN import LSTM_RNN
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import random
import cv2
import time
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", default="database_normalized", help="path to input dataset")
ap.add_argument("-m", "--model", default="wave", help="path to output model")
ap.add_argument("-p", "--plot", type=str, default="plot.png", help="path to output loss/accuracy plot")
args = vars(ap.parse_args())

# initialize the number of epochs to train for, initia learning rate,
# and batch size
EPOCHS = 100
INIT_LR = 1e-3 * 5
BS = 1024
decaying_learning_rate = True
learning_rate = 0.0025 #used if decaying_learning_rate set to False
decay_rate = 0.02 #the base of the exponential in the decay
#decay_steps = 100000 #used in decay every 60000 steps with a base of 0.96
lambda_loss_amount = 0.0015
# initialize the data and labels
labels = [
	"JUMPING",
    "JUMPING_JACKS",
    "BOXING",
    "WAVING_2HANDS",
    "WAVING_1HAND",
    "CLAPPING_HANDS"
]
X_train_path = args["dataset"] + "\X_train.txt"
X_test_path = args["dataset"] + "\X_test.txt"

y_train_path = args["dataset"] + "\Y_train.txt"
y_test_path = args["dataset"] + "\Y_test.txt"

n_steps = 32

def load_X(X_path):
    file = open(X_path, 'r')
    X_ = np.array(
        [elem for elem in [
            row.split(' ') for row in file
        ]], 
        dtype=np.float32
    )
    file.close()
    blocks = int(len(X_) / n_steps)
    
    X_ = np.array(np.split(X_,blocks))

    return X_

def load_y(y_path):
    file = open(y_path, 'r')
    y_ = np.array(
        [elem for elem in [
            row.replace('  ', ' ').strip().split(' ') for row in file
        ]], 
        dtype=np.int32
    )
    file.close()
    
    # for 0-based indexing 
    return y_ - 1

print("[INFO] loading data...")
trainX = load_X(X_train_path)
testX = load_X(X_test_path)

trainY = load_y(y_train_path)
testY = load_y(y_test_path)

training_data_count = len(trainX)  # 4519 training series (with 50% overlap between each serie)
test_data_count = len(testX)  # 1197 test series
n_input = len(trainX[0][0])  # num input parameters per timestep

n_hidden = 34 # Hidden layer num of features
n_classes = 6

print("(X shape, y shape, every X's mean, every X's standard deviation)")
print(trainX.shape, testY.shape, np.mean(testX), np.std(testX))
print("\nThe dataset has not been preprocessed, is not normalised etc")

# initialize the model
print("[INFO] compiling model...")
model = LSTM_RNN.build(n_hidden=n_hidden, n_input=n_input, n_steps=n_steps, n_classes=n_classes, batch_size=BS, lambda_loss_amount=lambda_loss_amount)
opt = Adam(lr=learning_rate, decay=decay_rate)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

# train the network
print("[INFO] training network...")

y_train_one_hot = to_categorical(trainY, 6)
y_test_one_hot = to_categorical(testY, 6)

train_size = trainX.shape[0] - trainX.shape[0] % BS
test_size = testX.shape[0] - testX.shape[0] % BS

H = model.fit(trainX[:train_size,:,:], y_train_one_hot[:train_size,:], batch_size=BS, epochs=EPOCHS, validation_data=(testX[:test_size,:,:], y_test_one_hot[:test_size,:]))

The LSTM_RNN model:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras import regularizers
from tensorflow.keras import backend as K

class LSTM_RNN:
	@staticmethod
	def build(n_hidden, n_input, n_steps, n_classes, batch_size, lambda_loss_amount):
		# initialize the model
		model = Sequential()

		model.add(Dense(n_hidden, activation="relu", kernel_initializer="random_normal", bias_initializer="random_normal",
				  batch_input_shape=(batch_size, n_steps, n_input)))
		model.add(LSTM(n_hidden, return_sequences=True, unit_forget_bias=1.0))
		model.add(LSTM(n_hidden, unit_forget_bias=1.0))
		model.add(Dense(n_classes, activation="softmax", kernel_initializer="random_normal", bias_initializer="random_normal",
				  kernel_regularizer=regularizers.l2(lambda_loss_amount), bias_regularizer=regularizers.l2(lambda_loss_amount)))

		return model

Nov 29 '20 01:11 bhdrozgn

RNN-for-Human-Activity-Recognition-using-2D-Pose-Input RNN-for-Human-Activity-Recognition-using-2D-Pose-Input copied to clipboard

Is Inference dependent on a certain range of position for the keypoints coordinates?

RNN-for-Human-Activity-Recognition-using-2D-Pose-Input
RNN-for-Human-Activity-Recognition-using-2D-Pose-Input copied to clipboard