zhangjf issues

Results 4 issues of


zhangjf

10~1000 times speedup of Kmeans with pytorch+cuda

I found Kmeans in mauve.compute_mauve takes quite a long time, so i implemented Kmeans (with the same api to kmeans in sklearn) with pytorch and cuda, and replace the faiss.Kmeans...

想请教一下model.layers.AttentionMerge的原理

您好，我看到您在csqa-leaderboard上的卓越的成效，叹服之余前来学习，在看到AttentionMerge的时候遇到了一点疑惑其中在forward时，经过“**attention_probs = keys @ self.query_ / math.sqrt(self.attention_size * query_var)**”运算后，得到的attention_probs其shape为（B, L, 1），按我自己的理解是得到了该layer的query在输入序列上的attention分布接下来的一步“**attention_probs = F.softmax(attention_probs * mask, dim=1)**”则正是我不解的地方： 1. 如果参照不输入mask参数，即**mask = torch.zeros_like(values)**，那么（**attention_probs * mask**）所得的则也为zeros，attention_probs的计算与输出无关了 2. 如果输入了mask参数，即**mask = (1 - mask.unsqueeze(-1).type(torch.float))...

Question about the classifier used for IntentAccuracyDailyDialog.

According to the source code of [class IntentAccuracyDailyDialog(BaseMetric)](https://github.com/allenai/RL4LMs/blob/97df0bd2f7406a906206c9610aea795fbf52884c/rl4lms/envs/text_generation/metric.py#L663), the intent likelihood of utterances on DailyDialog is computed by `rajkumarrrk/roberta-daily-dialog-intent-classifier`. However, according to the `config.json` of this classifier, it is used...

zhangjf

10~1000 times speedup of Kmeans with pytorch+cuda

想请教一下model.layers.AttentionMerge的原理

Question about the classifier used for IntentAccuracyDailyDialog.

Add KMeans on GPU