ThisisBillhe

Results 40 comments of ThisisBillhe

> 目前不支持,这个是cnn网络吗?可以换用等价写法,用conv 1x1,注意输出通道数是32倍数 是linear层。主要是想了解一下transformer二值化的可能性,当然全都换成conv1*1也不是不行。

确实,如果Conv后不是BN的话就会有这个问题,比如EfficientNet。 (这个仓库是这位UCB教授的,所以他可能看不懂中文hh)

maybe my phone hasn't root,I think

same problem here, do not have a clue. Does multi-gpu generation matter?

> 1. Yes > 2. I successfully reproduce the result by uniformally generate 50 images from each class. Results is shown below(IS by torch-fidelity, others by [guided_diffusion evaluation code](https://github.com/openai/guided-diffusion/tree/main/evaluations). The...

Hi everyone! I have successfully quantized a diffusion model to 2-bit and manually packed them into uint8 format (store 4x 2-bit weight in an uint8 variable) in pytorch. During inference,...

> > Hi everyone! I have successfully quantized a diffusion model to 2-bit and manually packed them into uint8 format (store 4x 2-bit weight in an uint8 variable) in pytorch....

any ideas how to do that? > this looks similar, it says to upgrade cython and regenerate the file: [mcfletch/pyopengl#74](https://github.com/mcfletch/pyopengl/issues/74)

same OOM problem here when finetuning 7B models on a single A100-80G. The error log is exactly the same as @Williamsunsir . Theoretically, finetuning a 7B model takes 14G (model...

Hi everyone, I have another question regarding reproducing XSUM results. In h2o_hf/scripts/summarization/eval.sh, it sets a fixed HH_SIZE and RECENT_SIZE, but the x-axis of figure 4 represents KV Cache Budget (%),...