Wang Yi comments

Results 42 comments of


                                            Wang Yi

Usage

No, you have to extract embeddings of those unlabeled faces using CNN or other extractors.

add t5 model

有一个部分需要修改： `T5ExtendedAttnMask` 是按照libai bert复现的，但是在和Megatron对齐的过程中，发现Megatron只是简单地做了 `.unsqueeze(1)`，和 `T5ExtendedAttnMask` 的实现不同，因此没有用上。所以这部分需要删除还是修改成等价于 `.unsqueeze(1)` 的操作放到模型里面？

> > T5ExtendedAttnMask 是按照libai bert复现的，但是在和Megatron对齐的过程中，发现Megatron只是简单地做了 .unsqueeze(1)，和 T5ExtendedAttnMask 的实现不同，因此没有用上。所以这部分需要删除还是修改成等价于 .unsqueeze(1) 的操作放到模型里面？ > > 这个因为megatron构造t5数据时，encoder_padding_mask等是[bsz, seqlen, seqlen]的维度，在bert里是[bsz, seqlen]的维度，因此它的处理和bert不同。我在https://github.com/Oneflow-Inc/libai/blob/add_models/libai/layers/mask_helpers.py 里把所有的attention mask操作都写成了模块，可以用这个替换掉bert和t5中的ExtendedAttnMask 这个改完了

Add idea t5 project

> 这个能 train 起来吗可以train的，你那边会遇到问题吗？

So how to asses the image quality?

> Specifically, we use harmonic mean of the predicted variance σ ∈R512 as the approximated measurement of the estimated uncertainty. The same below. quote from the paper. In practice, you...

使用webface训练结果较差

哪个webface？webface260M还是casia-webface？后者精度低是正常的

'Tensor' object has no attribute 'bool'

我来加吧

带有 BatchNorm2d 的模型在开启 amp 和 grad acc 时会报错

> 这个bug应该是一直以来就存在的，主要原因是repeat op在clear list里，amp算法可能会把repeat op推导为half，导致normalization op的输入moving mean和moving var在repeat之前就转换成了half，而amp算法只是标记了moving mean和moving var是no cast的（即不插入cast op转换成half），但并没有考虑输入已经是half的情况下怎么处理。是的，Libai 里面没有带 BN 的模型，所以这个 bug 一直没暴露出来

flow.Tensor 包数据存在误差

这里是因为 `DoubleTensor` 的构造函数先走了 `flow.Tensor` 再 `.to(flow.double)` ，中间经过了 fp32，导致精度出现了问题，我来修复

Add MaxUnpool op

> 另外还可以加个global测试已添加