Dai Zhuyun (戴竹韵)

Results 20 comments of Dai Zhuyun (戴竹韵)

should the values in the embedding be set to range (0,1): -- No. We normalize the embedding in the implementation, so it will automatically convert the values into range [0,...

比如你有一个query, 和三个文章:doc1, doc2, doc3. query 和 doc1: 很相关. relevance= 2 query 和 doc2: 有点相关. relevance= 1 query 和 doc3: 不相关. relevance= 0 (relevance score来自于人工标注,或者通过clickthrough data) 那么生成的样本有: query doc1 doc2 score_difference=2-1=1...

可以的!我们的论文里也是使用clicks。 On Mon, Jan 8, 2018 at 9:03 PM moluxiaobei wrote: > 谢谢博士学姐回复!因为这个相关不能人工判断,我打算利用query doc的点击或者点击率来代替。 > > — > You are receiving this because you commented. > > > Reply to...

Hi, Yes, you are correct about this. Thank you! On Fri, Nov 23, 2018 at 3:14 AM EdwardLorenz wrote: > Hi zhuyun, > Please help me check this problem: >...

Yes! It is possible to use other type of kernels, as long as it describes the `distance' from a similarity score to a target score.

1) 可以考虑early stopping或者加dropout。100M的商品,感觉确实容易overfit. 有没有试过把某些商品过滤掉?文本搜索中我们会把低频词删掉,这样word embedding layer的纬度能变小很多。 2) 我不太明白你的方法具体是怎么做的。能举个例子,说说q和d分别是什么样子的? 比如 q: 'clicked_item1', d:'new_item1', 或者 q:'clicked_item1, clicked_item2, clicked_item3', d:'new_item1'……? 3)一个原因可能是最后一层激活函数tanh把分数都压缩在-1到1之间了,可以把tanh去掉试试。另一个可能是这些商品在训练数据中出现的频率不够多,所以它们的embedding并没有被充分学习过。

你好!谢谢对我们research的关注!数据是搜狗保密的,无法公开…

Hi, Are your labels binary (relevant / non-relevant)? If so, use a baseline ranker, e.g. BM25, to retrieve top 100 documents for a query. Then a training instance is (query,...

Let's say a query is 'Apple', and its relevant documents: Very relevant doc (score=2): 'iPhone X - apple.com', Somehow relevant doc (score=1): 'apple inc - wikipedia', and 10 other non-relevant...

Which dataset are you using? Is it your own dataset? I guess you need to pass over a traditional IR and take the results. On Mon, May 20, 2019 at...