shuDaoNan9 issues

Results 11 issues of


shuDaoNan9

java.lang.NegativeArraySizeException while trainning

If I do not set numBatches, there will be ‘NegativeArraySizeException’ or ‘OOM’ during trainning big dataset (about 26320507 rows), and the cpu utilization will be less than 90%. **But if...

bug

area/lightgbm

Why it almost do not speedup with distributed learning?

First, I tried **2 spark slaves**, it take about 11 minutes to train my model. submit info: spark-submit --master yarn **--num-executors 2** --executor-memory 19G --executor-cores 16 --conf spark.dynamicAllocation.enabled=false --jars s3://EMR/jars/synapseml-vw_2.12-0.9.4.jar,s3://EMR/jars/synapseml_2.12-0.9.4.jar,s3://EMR/jars/client-sdk-1.14.0.jar...

area/lightgbm

数据分批读取训练问题

大数据场景下全量加载到显卡训练显然不现实，鄙人在TF2.4.0中试了下将数据处理好后分批给到DeepMatch模型，发现有以下问题： **1. 通过tf.data.Dataset.from_tensor_slices将数据分批后训练，却发现在tf.compat.v1.disable_eager_execution()后无法使用，报错如下：** D:\Anaconda3\envs\TF2GPU\lib\site-packages\tensorflow\python\keras\backend.py:434: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method...

question

只能导出h5格式模型，pb格式无法导出

只能导出h5格式模型，pb格式无法导出：导出模型code如下： tf.saved_model.save(model, outputDir + 'YouTubeNet_model2') 或： from tensorflow.python.keras.models import Model, load_model, save_model save_model(model, 'YouTubeNet_model.pb',save_format='tf') 报错如下： Traceback (most recent call last): File "F:/python/DeepMatch-master/examples/**run_youtubednn**.py", line 70, in tf.saved_model.save(model, outputDir + 'YouTubeNet_model2')...

How can I train this model useing my own images?

I just want to add some photos to the training set

数据集用户id、商品id等cutoff问题

运行DCN模型跑下面这个数据集时候有些疑问： http://labs.criteo.com/2014/02/download-kaggle-display-advertising-challenge-dataset/ Kaggle Display Advertising Challenge Dataset 我看里面数据格式是： The columns are tab separeted with the following schema: ... ... 并没有区分用户id、商品id，那这样如何给用户做推荐呢？而且我看get_criteo_feature.py处理的时候，很多categorical 类型数据直接被截断没了，那如何区分开用户呢？ parser.add_argument( "--cutoff", type=int, default=200, help="cutoff long-tailed categorical values" )...

DCN模型：field_size、feature_size等参数导致的reshape、embedding_lookup等问题

1. 请问DCN模型的代码不能直接用于criteo数据集吗？还是要运行的时候传哪几个参数？ 2. 我看代码里面默认field_size是0，这里必须要在运行时候传参吧，比如我的是field_size=2496 ？不传参的话“ feat_vals = tf.reshape(feat_vals, shape=[-1, field_size, 1])”这里reshape成（-1,0）就报错了： Reshape cannot infer unless all specified input sizes are non-zero。但是传参的话，后面feature_size代码里面默认也是0，且没有计算新值赋值，导致变成Feat_Emb维度是(0, 32)，然后又引起 embeddings = tf.nn.embedding_lookup(Feat_Emb, feat_ids) # None *...

libsvm数据转化问题

建议可以加个readme 比如get_criteo_feature.py默认测试集的所有特征都在训练集出现过，否则feature_map不全；比如测试的数据不能太少，不然cutoff都没了；比如测试集这里跟训练集这里下标差一：val = dists.gen(i, features[continous_features[i] - 1])，然后我改成跟训练集一样的下标了，应该是我的数据格式测试集合训练集是一样的，博主的两者数据坐标差一？测试集的label = features[0]我也加上去了，这样后面对比测试效果应该能更加方便对比，不然延用训练集的最后一个label感觉怪怪的；比如数值型连续值不能只有一个唯一值，否则归一化出错； ...........

请问youtube_match_model.py为何不用原论文的softmax的方式呢？

我看YouTubeNet原本的召回阶段使用的是softmax分类，是因为认为这个应该被理解为多分类问题，可以“同时”选择多个“下个”要播放的视频吗？ https://github.com/yangxudong/deeplearning/blob/master/youtube_match_model/youtube_match_model.py loss = tf.nn.sigmoid_cross_entropy_with_logits( labels=labels_one_hot, logits=logits) 谢谢！

difference about FMClassifier in spark3.0

I find FMClassifier in spark3, but I don't know what formate my featuresCol should be. I used my gbdt feature, but the result AUC is bad. some of my code:...