huang wei
huang wei
# 添加dalle2,后续持续完善中并记录遇到的一些问题。 python: 3.6.9 oneflow: 0.7.0+cu112 代码主要参考[dalle2_pytorch](https://github.com/lucidrains/DALLE2-pytorch), 里面还有一部分没太看明白:( ,基本上是 `import torch` -> `import oneflow as flow`, 然后把代码中的'torch'全部替换为'flow', 少部分接口改一改入参格式,基本可以跑通 :) 。
oneflow 在执行矩阵乘法时,如存在dim=0的维度,则会报错 ``` >>> import torch >>> import oneflow as flow loaded library: /lib/x86_64-linux-gnu/libibverbs.so.1 >>> torch.__version__ '1.10.2' >>> flow.__version__ '0.8.1.dev20220903+cu112' >>> a = torch.randn(0, 5) >>> b = torch.randn(5, 6)...
fused kernels in alphafold
## Summary fp16除法操作返回结果类型和torch不一致 ## Code to reproduce bug ```python >>>import oneflow as flow >>>a = flow.randn(3, 3, dtype=flow.float16).cuda() >>>b = flow.randn(3, 3, dtype=flow.float16).cuda() >>>a/b tensor([[-2.1495e-03, 1.5983e+00, -5.2973e-01], [-1.7968e-01, -4.0361e+00, 5.4459e-01],...
Hello, I noticed that the sliding window size may be different in the prefill stage and the decode stage. As in the prefill stage, the current token is visible along...
llama模型并行推理优化,将每一层LlamaDecoderLayer 所有的cuda kernel放在一个大op里, 尽可能减少python层面指令发送的延迟。
Hello, I wonder if the position id of query is the same with key or is the actual generated context length ([this comment is confusing me](https://github.com/mit-han-lab/streaming-llm/blob/d729b3ffc947caca63fc0f7644b7468ca2d50881/streaming_llm/pos_shift/modify_llama.py#L89))? For example, as mentioned...