Junity
Junity
> 还得有batch和文本数量、token总长不然没法对比的 既然设了1500这个限制,实际上不会差多少,1499 token推到1500和1推到2速度真差不多,batch size 1统一1即可,但实际上实测1到8都差不多(比较新的卡)
> 我最近也在对比速度,参数全部默认,模型是v2的,我发现进行推理的时候,amd的总能把所有cpu核心都跑起来,intel则会有很多核心闲置,设置torch.set_num_threads()也没有效果,不知道有没有办法优化 请问有没有具体的数据呢,比如amd多快intel多快
> windows11 + amd 7945HX + 4060laptop + 无CUDA + GSV2 = 185it/s 你下pytorch的时候会把cuda给装了的,所以这里肯定是有cuda的
> 想问一下是否有 amd的cpu linux的系统 这样的组合,测试一下速度,想对比一下cpu的影响,因为amd很多全大核,intel很多大小核,cpu的调度可能会影响推理速度 国内租gpu平台不全是amd吗,你可以看看autodl
补一条 Ubuntu 24 10, nvidia driver version: newest, cuda 12.1, 4090d, i7 14700kf, gsv2: 530 tokens/s
> @JunityZhan are you sure? researchers are telling me it doesn't perform well at all with low precision. In training, I think it is better to train it with fp32....
btw, it fix #145
@lucidrains I am not sure I understand what you said about my fix will break for my own condition. If I dont make that change, I cannot inference it with...
> @JunityZhan do you want to see if setting [this](https://github.com/lucidrains/vector-quantize-pytorch/commit/be92f7908b938cf00c60d643c5ecffb01c7fa9fc#diff-3488501716ef0c4a1b84127592be7e1015e0faf0d7d3b630b69d88e55a134701R117) `False` works for you? I think you only make modification on looks up quantization, but not vq and fsq