Wing Lian comments

Results 103 comments of


                                            Wing Lian

DeepSpeed: `Error invalid configuration argument at line 216 in file /<snip>/bitsandbytes/csrc/ops.cu`

Did you try upgrading or downgrading bitsandbytes? On Thu, Feb 1, 2024 at 3:11 PM Daniel Chalef ***@***.***> wrote: > Please check that this issue hasn't been reported before. >...

WIP: galore optimizer

Thanks @younesbelkada! I'll open up another PR with just the validation and training args pieces and wait for the upstream integration. Much appreciated!

WIP: galore optimizer

Superseded by #1409. Thanks for getting this rolling @maximegmd. Props to @younesbelkada for getting this working upstream in transformers.

Add support to sharegpt strict: false for more formats

@dctanner you somehow had some already merged PR's in your branch, so I re-pushed your commit onto a rebased main.

Cannot import name 'load_dataset' from .... module ‘datasets’

Do you have a folder in your working directory called datasets?

[BOUNTY] Optimized Triton Kernels for full fine tunes

> @winglian I suggest you put a targeted speedup, on what qualifies for "optimized". Who knows, maybe `torch.compile` used the right way can generate your definition of "optimized" :) and...

RuntimeError: Error(s) in loading state_dict for MistralForCausalLM (Deepspeed Zero 3)

Are you using a model from a checkpoint folder or the output folder?

RuntimeError: Error(s) in loading state_dict for MistralForCausalLM (Deepspeed Zero 3)

> Using `transformers @ git+https://github.com/huggingface/transformers.git@3cefac1d974db5e2825a0cb2b842883a628be7a0` seems to work. @mgoulao is this a transformers regression then? That particular commit works with zero3 ?

Fine-tuning 20B model doesn't seem to work

I believe the problem is that the model's modules are all frozen and have `requires_grad` set to `False`. You can verify this with: ``` for name, param in model.named_parameters(recurse=True): print(f"{name}:...