苏剑林(Jianlin Su)

Results 390 comments of 苏剑林(Jianlin Su)

我也没碰到过这么奇怪的问题~

这个实现不能用model.save,只能用model.save_weights

怎么隔了半年又冒出这个问题... Embedding-Mapping 是用来对齐维度的,如果hidden_size==embedding_size,这个操作就是多此一举,浪费算力,这是非常显然成立的事情。所以hidden_size==embedding_size时不会存在Embedding-Mapping层,加载权重时会自动将它排除在外,所以也不会报错。 另外,任何时刻都只推荐用最新版bert4keras,0.6.4这么老的版本,还是尽早同步到最新版吧。不排除旧版本确实有“雷”。

I known what MINE or DeepInfoMax do in their own paper. The problem is, if your transform H(X) = I(X,Z) is right, then I(X,Z) will be limited, so does KL(p(x,z)||p(x)p(z))...

Let us compare your new paper with deepinfomax. In deepinfomax, the main target is to extract good feature by maximizing MI. And we know MI = KL(p(x,z)||p(x)p(z)) is actually a...

In a word, I am really interested in your model but I want to make the whole derivation more naturally.

yet, I am very appreciated with your interpretation (energy minima) of gradient penalty. however, gradient penalty is our prior knowledge of this problem. If there is a replaceable approach with...

你这逻辑上就错了,先有tokenizer,然后对输入进行tokenize,然后根据tokenize的结果构建标签。你这是妄想tokenizer按照你所给标签进行对齐么?

如果你的“明确”是指固定,那么可以在`build_transformer_model`的时候传入`sequence_length=xxx`