ai-matrix
ai-matrix copied to clipboard
Bugs in DIEN and DIEN_TF2, both got nan when training with prepare_data.sh
Hi, Ali ai-matrix team
I recently tried this repo and verified on DIEN.
Somehow, I verified both using prepare_dataset.sh and prepare_data.sh to prepare data for training, and I noticed that it seems current DIEN codes only works with prepare_dataset.sh and if I used prepare_data.sh to do feature enabling, training will always got nan.
see pic as below:
Is this a known issue? I also tried another repo from ali, https://github.com/alibaba/x-deeplearning/tree/master/xdl-algorithm-solution/DIEN, which seems handles well with prepare_data.sh
Looking forward your guys' reply, I'll also work on to see if I can make a quick fix, after all, I think this is an issue should be reported here.
Best regards, Chendi
Update:
After debugging, noticed that nan issue was caused by records with numHistory as 1, after filtering out these lines in local_aggregate.py, train codes now worked with prepare_data.sh
FYI