BARS icon indicating copy to clipboard operation
BARS copied to clipboard

DNN's parameter sizes not consistent with the log.

Open juyaoliu opened this issue 2 years ago • 3 comments

DNN_avazu_x1 log shows "2022-02-08 10:25:00,299 P50417 INFO [Metrics] AUC: 0.763019 - logloss: 0.368178". But the results in my machine is "2023-04-25 19:35:46,393 P47366 INFO [Metrics] AUC: 0.755872 - logloss: 0.371449"

The reason may be the inconsistency of the network, where the number of parameters in DNN_avazu_x1 log is 13805192 while mine is 13805191. I can't figure out what the 1 parameter is. The detail of my parameters is as follows: embedding_layer.embedding_layer.embedding_layer.feat_1.weight torch.Size([8, 10]) embedding_layer.embedding_layer.embedding_layer.feat_2.weight torch.Size([8, 10]) embedding_layer.embedding_layer.embedding_layer.feat_3.weight torch.Size([3479, 10]) embedding_layer.embedding_layer.embedding_layer.feat_4.weight torch.Size([4270, 10]) embedding_layer.embedding_layer.embedding_layer.feat_5.weight torch.Size([25, 10]) embedding_layer.embedding_layer.embedding_layer.feat_6.weight torch.Size([4863, 10]) embedding_layer.embedding_layer.embedding_layer.feat_7.weight torch.Size([304, 10]) embedding_layer.embedding_layer.embedding_layer.feat_8.weight torch.Size([32, 10]) embedding_layer.embedding_layer.embedding_layer.feat_9.weight torch.Size([228185, 10]) embedding_layer.embedding_layer.embedding_layer.feat_10.weight torch.Size([1048284, 10]) embedding_layer.embedding_layer.embedding_layer.feat_11.weight torch.Size([6514, 10]) embedding_layer.embedding_layer.embedding_layer.feat_12.weight torch.Size([5, 10]) embedding_layer.embedding_layer.embedding_layer.feat_13.weight torch.Size([5, 10]) embedding_layer.embedding_layer.embedding_layer.feat_14.weight torch.Size([1939, 10]) embedding_layer.embedding_layer.embedding_layer.feat_15.weight torch.Size([9, 10]) embedding_layer.embedding_layer.embedding_layer.feat_16.weight torch.Size([10, 10]) embedding_layer.embedding_layer.embedding_layer.feat_17.weight torch.Size([348, 10]) embedding_layer.embedding_layer.embedding_layer.feat_18.weight torch.Size([5, 10]) embedding_layer.embedding_layer.embedding_layer.feat_19.weight torch.Size([60, 10]) embedding_layer.embedding_layer.embedding_layer.feat_20.weight torch.Size([170, 10]) embedding_layer.embedding_layer.embedding_layer.feat_21.weight torch.Size([51, 10]) embedding_layer.embedding_layer.embedding_layer.feat_22.weight torch.Size([25, 10]) dnn.dnn.0.weight torch.Size([400, 220]) dnn.dnn.0.bias torch.Size([400]) dnn.dnn.3.weight torch.Size([400, 400]) dnn.dnn.3.bias torch.Size([400]) dnn.dnn.6.weight torch.Size([400, 400]) dnn.dnn.6.bias torch.Size([400]) dnn.dnn.9.weight torch.Size([1, 400]) dnn.dnn.9.bias torch.Size([1])

juyaoliu avatar Apr 25 '23 14:04 juyaoliu

Did you use fuxictr: 1.1.0 for DNN_avazu_x1? Our experiment was done with fuxictr v1.1.0.

Could you provide your environment settings and the complete running log?

xpai avatar Apr 28 '23 23:04 xpai

No, I use fuxictrv1.1.1. However, the total number of parameters in the official log(13805192) is weird, since all parameter sizes are multiples of 10 except for the bias of the last linear module.

My environment settings are as follows: Python 3.6.2 fuxictr 1.1.1 torch 1.0.1 cuda 10.0

Here is my running log: 2023-04-26 21:12:50,124 P44728 INFO { "batch_norm": "False", "batch_size": "4096", "data_block_size": "-1", "data_format": "csv", "data_root": "../data/Avazu/", "dataset_id": "avazu_x1_3fb65689", "debug": "False", "embedding_dim": "10", "embedding_regularizer": "0.01", "epochs": "100", "every_x_epochs": "1", "feature_cols": "[{'active': True, 'dtype': 'float', 'name': ['feat_1', 'feat_2', 'feat_3', 'feat_4', 'feat_5', 'feat_6', 'feat_7', 'feat_8', 'feat_9', 'feat_10', 'feat_11', 'feat_12', 'feat_13', 'feat_14', 'feat_15', 'feat_16', 'feat_17', 'feat_18', 'feat_19', 'feat_20', 'feat_21', 'feat_22'], 'type': 'categorical'}]", "gpu": "0", "hidden_activations": "relu", "hidden_units": "[400, 400, 400]", "label_col": "{'dtype': 'float', 'name': 'label'}", "learning_rate": "0.001", "loss": "binary_crossentropy", "metrics": "['AUC', 'logloss']", "min_categr_count": "1", "model": "DNN", "model_id": "DNN_avazu_x1_001_3da2d674", "model_root": "./Avazu/DNN_avazu_x1/", "monitor": "AUC", "monitor_mode": "max", "net_dropout": "0.1", "net_regularizer": "0", "num_workers": "3", "optimizer": "adam", "patience": "2", "pickle_feature_encoder": "True", "save_best_only": "True", "seed": "2021", "shuffle": "True", "task": "binary_classification", "test_data": "../data/Avazu/Avazu_x1/test.csv", "train_data": "../data/Avazu/Avazu_x1/train.csv", "use_hdf5": "True", "valid_data": "../data/Avazu/Avazu_x1/valid.csv", "verbose": "1", "version": "pytorch" } 2023-04-26 21:12:50,126 P44728 INFO Set up feature encoder... 2023-04-26 21:12:50,126 P44728 INFO Load feature_map from json: ../data/Avazu/avazu_x1_3fb65689/feature_map.json 2023-04-26 21:12:50,126 P44728 INFO Loading data... 2023-04-26 21:12:50,131 P44728 INFO Loading data from h5: ../data/Avazu/avazu_x1_3fb65689/train.h5 2023-04-26 21:13:20,178 P44728 INFO Loading data from h5: ../data/Avazu/avazu_x1_3fb65689/valid.h5 2023-04-26 21:13:23,516 P44728 INFO Train samples: total/28300276, pos/4953382, neg/23346894, ratio/17.50%, blocks/1 2023-04-26 21:13:23,516 P44728 INFO Validation samples: total/4042897, pos/678699, neg/3364198, ratio/16.79%, blocks/1 2023-04-26 21:13:23,516 P44728 INFO Loading train data done. 2023-04-26 21:13:34,906 P44728 INFO Total number of parameters: 13395591. 2023-04-26 21:13:34,909 P44728 INFO Start training: 6910 batches/epoch 2023-04-26 21:13:34,909 P44728 INFO ************ Epoch=1 start ************ 2023-04-26 21:25:38,237 P44728 INFO [Metrics] AUC: 0.741129 - logloss: 0.399211 2023-04-26 21:25:38,264 P44728 INFO Save best model: monitor(max): 0.741129 2023-04-26 21:25:38,869 P44728 INFO --- 6910/6910 batches finished --- 2023-04-26 21:25:39,508 P44728 INFO Train loss: 0.427804 2023-04-26 21:25:39,509 P44728 INFO ************ Epoch=1 end ************ 2023-04-26 21:38:04,039 P44728 INFO [Metrics] AUC: 0.742017 - logloss: 0.399019 2023-04-26 21:38:04,062 P44728 INFO Save best model: monitor(max): 0.742017 2023-04-26 21:38:04,631 P44728 INFO --- 6910/6910 batches finished --- 2023-04-26 21:38:05,386 P44728 INFO Train loss: 0.428199 2023-04-26 21:38:05,386 P44728 INFO ************ Epoch=2 end ************ 2023-04-26 21:50:18,433 P44728 INFO [Metrics] AUC: 0.739063 - logloss: 0.400070 2023-04-26 21:50:18,491 P44728 INFO Monitor(max) STOP: 0.739063 ! 2023-04-26 21:50:18,492 P44728 INFO Reduce learning rate on plateau: 0.000100 2023-04-26 21:50:18,492 P44728 INFO --- 6910/6910 batches finished --- 2023-04-26 21:50:18,983 P44728 INFO Train loss: 0.428046 2023-04-26 21:50:18,983 P44728 INFO ************ Epoch=3 end ************ 2023-04-26 22:13:27,905 P44728 INFO [Metrics] AUC: 0.741352 - logloss: 0.399251 2023-04-26 22:13:27,979 P44728 INFO Monitor(max) STOP: 0.741352 ! 2023-04-26 22:13:27,980 P44728 INFO Reduce learning rate on plateau: 0.000010 2023-04-26 22:13:27,981 P44728 INFO Early stopping at epoch=4 2023-04-26 22:13:27,981 P44728 INFO --- 6910/6910 batches finished --- 2023-04-26 22:13:28,884 P44728 INFO Train loss: 0.404582 2023-04-26 22:13:28,885 P44728 INFO Training finished. 2023-04-26 22:13:28,886 P44728 INFO Load best model: /data/home/liujuyao/BARS/ctr_prediction/benchmarks/DNN/DNN_avazu_x1/Avazu/DNN_avazu_x1/avazu_x1_3fb65689/DNN_avazu_x1_001_3da2d674.model 2023-04-26 22:13:30,951 P44728 INFO ****** Validation evaluation ****** 2023-04-26 22:17:56,183 P44728 INFO [Metrics] AUC: 0.742017 - logloss: 0.399019 2023-04-26 22:17:59,891 P44728 INFO ******** Test evaluation ******** 2023-04-26 22:17:59,893 P44728 INFO Loading data... 2023-04-26 22:17:59,895 P44728 INFO Loading data from h5: ../data/Avazu/avazu_x1_3fb65689/test.h5 2023-04-26 22:18:14,699 P44728 INFO Test samples: total/8085794, pos/1232985, neg/6852809, ratio/15.25%, blocks/1 2023-04-26 22:18:14,703 P44728 INFO Loading test data done. 2023-04-26 22:25:57,148 P44728 INFO [Metrics] AUC: 0.755872 - logloss: 0.371449

juyaoliu avatar Apr 29 '23 12:04 juyaoliu

I rerun the experiment using FuxiCTR v1.1.1 on my local machine (CUDA 10.0, python 3.6.5, torch 1.0.1.post2). The number of parameters is 13395591. The AUC result is almost consistent with our previous log run on the cloud service. But indeed the number of parameters is strange. I will check it later.


2023-04-30 23:20:23,767 P30261 INFO { "batch_norm": "False", "batch_size": "4096", "data_block_size": "-1", "data_format": "csv", "data_root": "../data/Avazu/", "dataset_id": "avazu_x1_3fb65689", "debug": "False", "embedding_dim": "10", "embedding_regularizer": "0.01", "epochs": "100", "every_x_epochs": "1", "feature_cols": "[{'active': True, 'dtype': 'float', 'name': ['feat_1', 'feat_2', 'feat_3', 'feat_4', 'feat_5', 'feat_6', 'feat_7', 'feat_8', 'feat_9', 'feat_10', 'feat_11', 'feat_12', 'feat_13', 'feat_14', 'feat_15', 'feat_16', 'feat_17', 'feat_18', 'feat_19', 'feat_20', 'feat_21', 'feat_22'], 'type': 'categorical'}]", "gpu": "0", "hidden_activations": "relu", "hidden_units": "[400, 400, 400]", "label_col": "{'dtype': 'float', 'name': 'label'}", "learning_rate": "0.001", "loss": "binary_crossentropy", "metrics": "['AUC', 'logloss']", "min_categr_count": "1", "model": "DNN", "model_id": "DNN_avazu_x1_001_3da2d674", "model_root": "./Avazu/DNN_avazu_x1/", "monitor": "AUC", "monitor_mode": "max", "net_dropout": "0.1", "net_regularizer": "0", "num_workers": "3", "optimizer": "adam", "patience": "2", "pickle_feature_encoder": "True", "save_best_only": "True", "seed": "2021", "shuffle": "True", "task": "binary_classification", "test_data": "../data/Avazu/Avazu_x1/test.csv", "train_data": "../data/Avazu/Avazu_x1/train.csv", "use_hdf5": "True", "valid_data": "../data/Avazu/Avazu_x1/valid.csv", "verbose": "0", "version": "pytorch" } 2023-04-30 23:20:23,768 P30261 INFO Set up feature encoder... 2023-04-30 23:20:23,768 P30261 INFO Reading file: ../data/Avazu/Avazu_x1/train.csv 2023-04-30 23:21:15,420 P30261 INFO Reading file: ../data/Avazu/Avazu_x1/valid.csv 2023-04-30 23:21:22,077 P30261 INFO Reading file: ../data/Avazu/Avazu_x1/test.csv 2023-04-30 23:21:35,696 P30261 INFO Preprocess feature columns... 2023-04-30 23:21:38,119 P30261 INFO Fit feature encoder... 2023-04-30 23:21:38,120 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_1', 'type': 'categorical'} 2023-04-30 23:21:44,942 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_2', 'type': 'categorical'} 2023-04-30 23:21:51,746 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_3', 'type': 'categorical'} 2023-04-30 23:21:58,842 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_4', 'type': 'categorical'} 2023-04-30 23:22:05,848 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_5', 'type': 'categorical'} 2023-04-30 23:22:12,458 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_6', 'type': 'categorical'} 2023-04-30 23:22:19,465 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_7', 'type': 'categorical'} 2023-04-30 23:22:26,184 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_8', 'type': 'categorical'} 2023-04-30 23:22:32,758 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_9', 'type': 'categorical'} 2023-04-30 23:22:41,580 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_10', 'type': 'categorical'} 2023-04-30 23:22:59,517 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_11', 'type': 'categorical'} 2023-04-30 23:23:06,705 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_12', 'type': 'categorical'} 2023-04-30 23:23:13,205 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_13', 'type': 'categorical'} 2023-04-30 23:23:19,779 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_14', 'type': 'categorical'} 2023-04-30 23:23:26,673 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_15', 'type': 'categorical'} 2023-04-30 23:23:33,126 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_16', 'type': 'categorical'} 2023-04-30 23:23:39,717 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_17', 'type': 'categorical'} 2023-04-30 23:23:46,454 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_18', 'type': 'categorical'} 2023-04-30 23:23:52,948 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_19', 'type': 'categorical'} 2023-04-30 23:23:59,535 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_20', 'type': 'categorical'} 2023-04-30 23:24:06,289 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_21', 'type': 'categorical'} 2023-04-30 23:24:12,951 P30261 INFO Processing column: {'active': True, 'dtype': 'float', 'name': 'feat_22', 'type': 'categorical'} 2023-04-30 23:24:19,458 P30261 INFO Set feature index... 2023-04-30 23:24:19,458 P30261 INFO Pickle feature_encoder: ../data/Avazu/avazu_x1_3fb65689/feature_encoder.pkl 2023-04-30 23:24:24,225 P30261 INFO Save feature_map to json: ../data/Avazu/avazu_x1_3fb65689/feature_map.json 2023-04-30 23:24:24,225 P30261 INFO Set feature encoder done. 2023-04-30 23:24:24,226 P30261 INFO Transform feature columns... 2023-04-30 23:29:24,947 P30261 INFO Saving data to h5: ../data/Avazu/avazu_x1_3fb65689/train.h5 2023-04-30 23:29:27,923 P30261 INFO Preprocess feature columns... 2023-04-30 23:29:28,356 P30261 INFO Transform feature columns... 2023-04-30 23:30:10,745 P30261 INFO Saving data to h5: ../data/Avazu/avazu_x1_3fb65689/valid.h5 2023-04-30 23:30:11,179 P30261 INFO Preprocess feature columns... 2023-04-30 23:30:11,674 P30261 INFO Transform feature columns... 2023-04-30 23:31:36,846 P30261 INFO Saving data to h5: ../data/Avazu/avazu_x1_3fb65689/test.h5 2023-04-30 23:31:37,770 P30261 INFO Transform csv data to h5 done. 2023-04-30 23:31:37,770 P30261 INFO Loading data... 2023-04-30 23:31:37,775 P30261 INFO Loading data from h5: ../data/Avazu/avazu_x1_3fb65689/train.h5 2023-04-30 23:31:40,087 P30261 INFO Loading data from h5: ../data/Avazu/avazu_x1_3fb65689/valid.h5 2023-04-30 23:31:40,437 P30261 INFO Train samples: total/28300276, pos/4953382, neg/23346894, ratio/17.50%, blocks/1 2023-04-30 23:31:40,437 P30261 INFO Validation samples: total/4042897, pos/678699, neg/3364198, ratio/16.79%, blocks/1 2023-04-30 23:31:40,437 P30261 INFO Loading train data done. 2023-04-30 23:31:44,086 P30261 INFO Total number of parameters: 13395591. 2023-04-30 23:31:44,086 P30261 INFO Start training: 6910 batches/epoch 2023-04-30 23:31:44,086 P30261 INFO ************ Epoch=1 start ************ 2023-04-30 23:37:23,427 P30261 INFO [Metrics] AUC: 0.740126 - logloss: 0.399131 2023-04-30 23:37:23,428 P30261 INFO Save best model: monitor(max): 0.740126 2023-04-30 23:37:23,474 P30261 INFO --- 6910/6910 batches finished --- 2023-04-30 23:37:23,600 P30261 INFO Train loss: 0.429047 2023-04-30 23:37:23,600 P30261 INFO ************ Epoch=1 end ************ 2023-04-30 23:43:02,752 P30261 INFO [Metrics] AUC: 0.741279 - logloss: 0.398655 2023-04-30 23:43:02,753 P30261 INFO Save best model: monitor(max): 0.741279 2023-04-30 23:43:02,831 P30261 INFO --- 6910/6910 batches finished --- 2023-04-30 23:43:02,985 P30261 INFO Train loss: 0.429014 2023-04-30 23:43:02,985 P30261 INFO ************ Epoch=2 end ************ 2023-04-30 23:48:42,227 P30261 INFO [Metrics] AUC: 0.741589 - logloss: 0.398752 2023-04-30 23:48:42,228 P30261 INFO Save best model: monitor(max): 0.741589 2023-04-30 23:48:42,292 P30261 INFO --- 6910/6910 batches finished --- 2023-04-30 23:48:42,455 P30261 INFO Train loss: 0.428438 2023-04-30 23:48:42,455 P30261 INFO ************ Epoch=3 end ************ 2023-04-30 23:54:23,032 P30261 INFO [Metrics] AUC: 0.741310 - logloss: 0.398844 2023-04-30 23:54:23,033 P30261 INFO Monitor(max) STOP: 0.741310 ! 2023-04-30 23:54:23,033 P30261 INFO Reduce learning rate on plateau: 0.000100 2023-04-30 23:54:23,033 P30261 INFO --- 6910/6910 batches finished --- 2023-04-30 23:54:23,208 P30261 INFO Train loss: 0.428715 2023-04-30 23:54:23,208 P30261 INFO ************ Epoch=4 end ************ 2023-05-01 00:00:04,044 P30261 INFO [Metrics] AUC: 0.744362 - logloss: 0.397738 2023-05-01 00:00:04,045 P30261 INFO Save best model: monitor(max): 0.744362 2023-05-01 00:00:04,127 P30261 INFO --- 6910/6910 batches finished --- 2023-05-01 00:00:04,282 P30261 INFO Train loss: 0.404087 2023-05-01 00:00:04,282 P30261 INFO ************ Epoch=5 end ************ 2023-05-01 00:05:46,535 P30261 INFO [Metrics] AUC: 0.745814 - logloss: 0.396411 2023-05-01 00:05:46,536 P30261 INFO Save best model: monitor(max): 0.745814 2023-05-01 00:05:46,609 P30261 INFO --- 6910/6910 batches finished --- 2023-05-01 00:05:46,792 P30261 INFO Train loss: 0.404804 2023-05-01 00:05:46,792 P30261 INFO ************ Epoch=6 end ************ 2023-05-01 00:11:27,408 P30261 INFO [Metrics] AUC: 0.745171 - logloss: 0.397119 2023-05-01 00:11:27,409 P30261 INFO Monitor(max) STOP: 0.745171 ! 2023-05-01 00:11:27,409 P30261 INFO Reduce learning rate on plateau: 0.000010 2023-05-01 00:11:27,409 P30261 INFO --- 6910/6910 batches finished --- 2023-05-01 00:11:27,562 P30261 INFO Train loss: 0.404994 2023-05-01 00:11:27,562 P30261 INFO ************ Epoch=7 end ************ 2023-05-01 00:17:08,303 P30261 INFO [Metrics] AUC: 0.743046 - logloss: 0.398896 2023-05-01 00:17:08,303 P30261 INFO Monitor(max) STOP: 0.743046 ! 2023-05-01 00:17:08,303 P30261 INFO Reduce learning rate on plateau: 0.000001 2023-05-01 00:17:08,304 P30261 INFO Early stopping at epoch=8 2023-05-01 00:17:08,304 P30261 INFO --- 6910/6910 batches finished --- 2023-05-01 00:17:08,468 P30261 INFO Train loss: 0.392950 2023-05-01 00:17:08,469 P30261 INFO Training finished. 2023-05-01 00:17:08,469 P30261 INFO Load best model: /home/xxx/xxx/FINAL/FuxiCTRv1.1/benchmarks_local/Avazu/DNN_avazu_x1/avazu_x1_3fb65689/DNN_avazu_x1_001_3da2d674.model 2023-05-01 00:17:08,511 P30261 INFO ****** Validation evaluation ****** 2023-05-01 00:17:21,073 P30261 INFO [Metrics] AUC: 0.745814 - logloss: 0.396411 2023-05-01 00:17:21,133 P30261 INFO ******** Test evaluation ******** 2023-05-01 00:17:21,133 P30261 INFO Loading data... 2023-05-01 00:17:21,134 P30261 INFO Loading data from h5: ../data/Avazu/avazu_x1_3fb65689/test.h5 2023-05-01 00:17:22,210 P30261 INFO Test samples: total/8085794, pos/1232985, neg/6852809, ratio/15.25%, blocks/1 2023-05-01 00:17:22,211 P30261 INFO Loading test data done. 2023-05-01 00:17:49,941 P30261 INFO [Metrics] AUC: 0.764551 - logloss: 0.367068

zhujiem avatar Apr 29 '23 15:04 zhujiem