Cheng Li comments

Results 26 comments of


                                            Cheng Li

trafficstars

mistral and mixtral inference[BUG]

there are multiple ways to address OOM: use GPUs with larger memory (e.g. 80gb); use more GPUs and apply tensor parallelism or expert parallelism (if it's a moe model); use...

ParallelDroplessMLP initialises self.mlp twice

This double init does not seem to affect memory usage? I printed the memory allocation before and after https://github.com/stanford-futuredata/megablocks/blob/main/megablocks/layers/dmoe_test.py#L41, although the MLP init and mlp_impl init are both called, the...

[WIP, Kernel] (3/N) Machete W4A8

Hi @LucasWilkinson, I ran the w4a8 benchmark in the PR, the gemm perf from machete is 15-20% slower than the marline kernels. Is that expected? Thanks.

Make encoder params contiguous before fsdp

> Hi @cli99 , I want to follow up with this PR, thanks for contributing! I haven't met this issue during my training, do you know under what circumstances the...

Torch script export not working with hugging face model

Removing the comment at `output = self.model(**batch)`gives ``` File "/usr/lib/python3/dist-packages/torch/jit/frontend.py", line 407, in __call__ return method(ctx, node) ^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/torch/jit/frontend.py", line 1245, in build_DictComp raise NotSupportedError(r, "Comprehension ifs are not...

Unable to script model

@callmekris can you also check the transformers version in both envs?