苏剑林(Jianlin Su)
苏剑林(Jianlin Su)
我也没碰到过这么奇怪的问题~
这个实现不能用model.save,只能用model.save_weights
embedding_size等于hidden_size时,为什么还要多此一举加个mapping层?显存太多了?
怎么隔了半年又冒出这个问题... Embedding-Mapping 是用来对齐维度的,如果hidden_size==embedding_size,这个操作就是多此一举,浪费算力,这是非常显然成立的事情。所以hidden_size==embedding_size时不会存在Embedding-Mapping层,加载权重时会自动将它排除在外,所以也不会报错。 另外,任何时刻都只推荐用最新版bert4keras,0.6.4这么老的版本,还是尽早同步到最新版吧。不排除旧版本确实有“雷”。
I known what MINE or DeepInfoMax do in their own paper. The problem is, if your transform H(X) = I(X,Z) is right, then I(X,Z) will be limited, so does KL(p(x,z)||p(x)p(z))...
Let us compare your new paper with deepinfomax. In deepinfomax, the main target is to extract good feature by maximizing MI. And we know MI = KL(p(x,z)||p(x)p(z)) is actually a...
In a word, I am really interested in your model but I want to make the whole derivation more naturally.
yet, I am very appreciated with your interpretation (energy minima) of gradient penalty. however, gradient penalty is our prior knowledge of this problem. If there is a replaceable approach with...
你这逻辑上就错了,先有tokenizer,然后对输入进行tokenize,然后根据tokenize的结果构建标签。你这是妄想tokenizer按照你所给标签进行对齐么?
如果你的“明确”是指固定,那么可以在`build_transformer_model`的时候传入`sequence_length=xxx`