苏剑林(Jianlin Su) comments

Results 390 comments of


                                            苏剑林(Jianlin Su)

ImportError: cannot import name 'Loss'

我也没碰到过这么奇怪的问题～

模型加载出错

这个实现不能用model.save，只能用model.save_weights

加载electra-base模型时，self.layers[layer]报错找不到“Emebedding-Mapping"。发现在构建模型时，prepare_variable()构建和variable_mapping()构建的结构有出入

embedding_size等于hidden_size时，为什么还要多此一举加个mapping层？显存太多了？

加载electra-base模型时，self.layers[layer]报错找不到“Emebedding-Mapping"。发现在构建模型时，prepare_variable()构建和variable_mapping()构建的结构有出入

怎么隔了半年又冒出这个问题... Embedding-Mapping 是用来对齐维度的，如果hidden_size==embedding_size，这个操作就是多此一举，浪费算力，这是非常显然成立的事情。所以hidden_size==embedding_size时不会存在Embedding-Mapping层，加载权重时会自动将它排除在外，所以也不会报错。另外，任何时刻都只推荐用最新版bert4keras，0.6.4这么老的版本，还是尽早同步到最新版吧。不排除旧版本确实有“雷”。

why not use KL divergence to estimate mutual information

I known what MINE or DeepInfoMax do in their own paper. The problem is, if your transform H(X) = I(X,Z) is right, then I(X,Z) will be limited, so does KL(p(x,z)||p(x)p(z))...

why not use KL divergence to estimate mutual information

Let us compare your new paper with deepinfomax. In deepinfomax, the main target is to extract good feature by maximizing MI. And we know MI = KL(p(x,z)||p(x)p(z)) is actually a...

how can we explain GAN works without I(X,Z) term?

In a word, I am really interested in your model but I want to make the whole derivation more naturally.

how if we use hinge loss as EnergyModel loss?

yet, I am very appreciated with your interpretation (energy minima) of gradient penalty. however, gradient penalty is our prior knowledge of this problem. If there is a replaceable approach with...

NER_CRF任务出现数据形状不一致错误

你这逻辑上就错了，先有tokenizer，然后对输入进行tokenize，然后根据tokenize的结果构建标签。你这是妄想tokenizer按照你所给标签进行对齐么？

明确bert中 layer的shape

如果你的“明确”是指固定，那么可以在`build_transformer_model`的时候传入`sequence_length=xxx`