Cheng Li

Results 26 comments of Cheng Li
trafficstars

there are multiple ways to address OOM: use GPUs with larger memory (e.g. 80gb); use more GPUs and apply tensor parallelism or expert parallelism (if it's a moe model); use...

This double init does not seem to affect memory usage? I printed the memory allocation before and after https://github.com/stanford-futuredata/megablocks/blob/main/megablocks/layers/dmoe_test.py#L41, although the MLP init and mlp_impl init are both called, the...

Hi @LucasWilkinson, I ran the w4a8 benchmark in the PR, the gemm perf from machete is 15-20% slower than the marline kernels. Is that expected? Thanks.

> Hi @cli99 , I want to follow up with this PR, thanks for contributing! I haven't met this issue during my training, do you know under what circumstances the...

Removing the comment at `output = self.model(**batch)`gives ``` File "/usr/lib/python3/dist-packages/torch/jit/frontend.py", line 407, in __call__ return method(ctx, node) ^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/torch/jit/frontend.py", line 1245, in build_DictComp raise NotSupportedError(r, "Comprehension ifs are not...

@callmekris can you also check the transformers version in both envs?