Dai Zhuyun (戴竹韵) comments

Results 20 comments of


                                            Dai Zhuyun (戴竹韵)

About embedding usage

should the values in the embedding be set to range (0,1): -- No. We normalize the embedding in the implementation, so it will automatically convert the values into range [0,...

比如你有一个query, 和三个文章:doc1, doc2, doc3. query 和 doc1: 很相关. relevance= 2 query 和 doc2: 有点相关. relevance= 1 query 和 doc3: 不相关. relevance= 0 （relevance score来自于人工标注，或者通过clickthrough data）那么生成的样本有： query doc1 doc2 score_difference=2-1=1...

关于训练样本数据的疑问

可以的！我们的论文里也是使用clicks。 On Mon, Jan 8, 2018 at 9:03 PM moluxiaobei wrote: > 谢谢博士学姐回复！因为这个相关不能人工判断，我打算利用query doc的点击或者点击率来代替。 > > — > You are receiving this because you commented. > > > Reply to...

Errors? when initial idf values in testing part.

Hi, Yes, you are correct about this. Thank you! On Fri, Nov 23, 2018 at 3:14 AM EdwardLorenz wrote: > Hi zhuyun, > Please help me check this problem: >...

Is possible to add other type of kernel ?

Yes! It is possible to use other type of kernels, as long as it describes the `distance' from a similarity score to a target score.

关于模型的小疑问

1) 可以考虑early stopping或者加dropout。100M的商品，感觉确实容易overfit. 有没有试过把某些商品过滤掉？文本搜索中我们会把低频词删掉，这样word embedding layer的纬度能变小很多。 2) 我不太明白你的方法具体是怎么做的。能举个例子，说说q和d分别是什么样子的？比如 q: 'clicked_item1', d:'new_item1'，或者 q:'clicked_item1, clicked_item2, clicked_item3', d:'new_item1'……？ 3）一个原因可能是最后一层激活函数tanh把分数都压缩在-1到1之间了，可以把tanh去掉试试。另一个可能是这些商品在训练数据中出现的频率不够多，所以它们的embedding并没有被充分学习过。

Dai Zhuyun (戴竹韵)

About embedding usage

关于训练样本数据的疑问

关于训练样本数据的疑问

Errors? when initial idf values in testing part.

Is possible to add other type of kernel ?

关于模型的小疑问

可以提供paper中使用的搜狗训练数据吗？

Dataset Preparation

Dataset Preparation

Dataset Preparation