Transformers4Rec Loss Increasing after few epochs

trafficstars

Hi @rnyak , @benfred @gabrielspmoreira @oliverholworthy , I'm training the t4rec model on custom data, but the loss is not decreasing after few epochs, instead it started increasing. Basically the loss started from 13.67 and after training for few epoch it get decreased to 6.43 and then it started increasing, I'm not sure what can be done to improve the loss more.

Here are my params:

params = {
    'batch_size': 512,
    'lr': 0.0005,
    'lr_scheduler': 'cosine',
    'num_train_epochs': 1,
    'using_test': True,
    'using_type': False,
    'bl_shuffle': True,
    'masking': 'mlm',
    'd_model': 256,
    'n_head': 32,
    'n_layer': 3,
    'proj_num': 1,
    'act_mlp': 'None',
    'item_correction': False,
    'neg_factor': 4,
    'label_smoothing': 0.0,
    'temperature': 1.5734215681668653,
    'remove_false_neg': True,
    'item_correction_factor': 0.04152252077012748,
    'transformer_dropout': 0.05096800263401626,
    'mlm_probability': 0.35044384745899415,
    'top20': True,
    'loss_types': True,
    'loss_types_type': 'Simple',
    'multi_task_emb': 0,
    'mt_num_layers': 1,
    'use_tanh': False,
    'seq_len': 20,
    'split': 0
}

Any suggesstion would be very helpful. Thanks in Advance!!

Originally posted by @alan-ai-learner in https://github.com/NVIDIA-Merlin/Transformers4Rec/issues/493#issuecomment-1471570279

Mar 20 '23 06:03 alan-ai-learner

@alan-ai-learner hello. It is hard to tell what'd be these values for your custom dataset. All of these params are hyper parameters. Did you do some hyper-parameter tuning? If not, you can first play with your learning rate and batch size. Then then reduce n_head and mlm_probability. You can see our paper experiments in here for different public datasets. But this does not mean, the same value of these hyper-params will work for your dataset.

Are you training your model only with item-id-list or with side features?

Mar 20 '23 12:03 rnyak

Thanks for responding @rnyak ,

I didn't do hyperparameter tuning I used these params from this repo they are using the same data. The only difference is the batch size they are using 1024 and they did some architectural changes in the t4rec.
I tried their approach end to end, and I'm able to start the training but after a few steps I got the error:

RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)

I'm unable to make it right and I gave up.

Also @rnyak, they wanted to do a pull request, to add features that help current t4rec to train faster. And the repo code I share is the 3rd position winning solution for Otto competiton.

For now, i went ahead with the default t4rec setup. FYI I'm using the item-id-list with side features, its categories, the data looks like for exp:

[12,34,55,56] , [1,2,3,1]

where the first list of values contains item ids and the second list of values contains the event type, where 1:clicks, 2:carts, and 3:orders.

Please let me know if any thing to make it work. Is there any direct relationship between, batch size and learning rate?

Mar 20 '23 14:03 alan-ai-learner

@bschifferer might help you with that if possible. His code is not merged with TF4Rec and it has custom implementations.

Mar 20 '23 21:03 rnyak

I see, @rnyak but i'm only using his code to preprocess the dataset and after that i'm trying to use the model architecture given in one of example of this repo.

So there is no straight forward way to overcome the problem i'm facing, i need to play with the params.

Mar 23 '23 06:03 alan-ai-learner

@alan-ai-learner how are you generating the schema file if you are not using NVTabular? thanks.

Mar 27 '23 15:03 rnyak

@alan-ai-learner how are generating the schema file if you are not using NVTabular? thanks.

I'm using this manual schema.. https://github.com/bschifferer/Kaggle-Otto-Comp/blob/master/01e_FE_Transformer/test.pb

Mar 27 '23 16:03 alan-ai-learner

Transformers4Rec Transformers4Rec copied to clipboard

Loss Increasing after few epochs

Transformers4Rec
Transformers4Rec copied to clipboard