hankcs comments

Results 123 comments of


                                            hankcs

Finder crashes os MacOS 10.15.6

Chrome crashes too, even when SIP is disabled.

Are some triple in freebase is wrong?

@kelvin-jiang Could you provide a mapping from mid to its name? `m.03_bjc` seems to an invalid id. Many ids in `FreebaseQA_fb_extract.txt` are not found in [Freebase/Wikidata Mappings](https://developers.google.com/freebase/#freebase-wikidata-mappings) or [Freebase Easy](http://freebase-easy.cs.uni-freiburg.de/dump/).

关于 java 版本中使用静态变量存储字典路径的问题

欢迎pr，可参考CustomDictionary的多实例重构方法： https://github.com/hankcs/HanLP/issues/1339

Datasets' cache not re-used

Similar issue found on `BartTokenizer`. You can bypass the bug by loading a fresh new tokenizer everytime. ``` dataset = dataset.map(lambda x: tokenize_func(x, BartTokenizer.from_pretrained(xxx)), num_proc=num_proc, desc='Tokenize') ```

output_deltas[k] = error * dsigmoid(self.ao[k]) 这行有问题吧？

我的理解是后者，请参考相应理论。另外，代码的原作者也是这么认为的。

train方法矩阵批量更新参数

是的，通常是用numpy矩阵运算实现的。这段代码是入门教学用的，可能主打简单吧。

指代消解的算法／模型什么时候会开源？

请参考：https://bbs.hankcs.com/t/topic/4186 正在忙别的研究，暂时没时间写这篇论文。

Inference新文档中有生单词问题

我好久没看这段代码了，凭记忆回答一下。这是因为主题模型中的Vocabulary在训练后就是个固定的结构，如果你拿A语料上训练的phi矩阵去B语料上用，那么你得用A的Vocabulary去B语料取id。

有相关的api文档吗，如何获得新文档的主题分布？

这个功能还在探索中。

Classifier parameter setting

这是一段搜索正则化因子的函数。 ```java /** * * Liblinear 自动寻参 * @author hankcs */ public class grid { public static double find_parameters(final Problem prob, double from, double end, double step) { if (from...