FlexGen icon indicating copy to clipboard operation
FlexGen copied to clipboard

Add Erebus and GALACTICA support

Open Sumanai opened this issue 2 years ago • 11 comments

Hello! I propose to add support for the Erebus family of models, these are finetune models of the original OPT. I looked at the code, and the support is not too difficult to add, and I was able to run a couple of models without major code modification. I can provide PR if needed. The link to one of the models, there are also the rest. https://huggingface.co/KoboldAI/OPT-2.7B-Erebus

Sumanai avatar Feb 22 '23 10:02 Sumanai

GALACTICA support would be nice as well. Can FlexGen be generalized to all OPTForCausalLM models?

oobabooga avatar Feb 22 '23 21:02 oobabooga

Unfortunately, the attempt to add GALACTICA in the same way failed. The problem seems to be the lack of handling parameters like attention_dropout, but this is purely a guess. After loading and at the first generation an error appears in the logs (I removed the repeating parts):

C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1141: block: [223,0,0], thread: [29,0,0  File "c:\users\username\flexgen\flexgen\flex_opt.py", line 873, in generate
] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu    self.generation_loop_overlap_single_batch()

  File "c:\users\username\flexgen\flexgen\flex_opt.py", line 1013, in generation_loop_overlap_single_batch
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1141: block: [223,0,0    self.sync()
], thread: [31,0,0  File "c:\users\username\flexgen\flexgen\flex_opt.py", line 782, in sync
] Assertion `srcIndex < srcSelectDimSize    torch.cuda.synchronize()
` failed.
  File "C:\Users\username\AppData\Roaming\Python\Python310\site-packages\torch\cuda\__init__.py", line 566, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

If we can solve this problem, we can remove some of the hardcode and let you load any model based on OPTForCausalLM.

Sumanai avatar Feb 23 '23 02:02 Sumanai

GALACTICA support would be cool! I think FlexGen can be generalized to OPTForCausalLM very easily. The error reported by @Sumanai looks wired to me. Need more investigation.

Ying1123 avatar Feb 26 '23 02:02 Ying1123

Is this just partial support?: https://github.com/FMInference/FlexGen/pull/83

Ph0rk0z avatar Mar 03 '23 19:03 Ph0rk0z

I have tried loading galactica-30b and I got this error:

    opt_config.py", line 118, in get_opt_config
        raise ValueError(f"Invalid model name: {name}")
        
ValueError: Invalid model name: galactica-30b

Not sure if that commit has already made it to flexgen==0.1.7 or if it is enough to load GALACTICA.

oobabooga avatar Mar 03 '23 19:03 oobabooga

I got a similar error to @Sumanai when using Erebus-13b on a 3080 when the text length gets long -

../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [6,0,0], thread: [91,0,0] AssertionsrcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [6,0,0], thread: [92,0,0] AssertionsrcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [6,0,0], thread: [93,0,0] AssertionsrcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [6,0,0], thread: [94,0,0] AssertionsrcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [6,0,0], thread: [95,0,0] AssertionsrcIndex < srcSelectDimSizefailed.

Tried changing policy parameters but nothing seems to work.

apenugon avatar Mar 06 '23 21:03 apenugon

I managed to make FlexGen work for Galactica-1.3b model by changing opt_config.py, flex_opt.py and tokenizer_config.json. @oobabooga 's Webui can successfully load the model and generate text using it. Vram use decreased as expected. However, all the text generated become gibberish (it's not due to parameter preset). Maybe someone would be interested in taking a closer look? I can upload the files I modified. I am not really a programming or ML expert... 2023-03-30 20-35-02屏幕截图 2023-03-30 20-34-37屏幕截图

fgdfgfthgr-fox avatar Mar 30 '23 07:03 fgdfgfthgr-fox

@fgdfgfthgr-fox can you create a fork of https://github.com/FMInference/FlexGen with your changes?

oobabooga avatar Mar 30 '23 14:03 oobabooga

@fgdfgfthgr-fox can you create a fork of https://github.com/FMInference/FlexGen with your changes?

@oobabooga https://github.com/fgdfgfthgr-fox/FlexGen---galactica-support Is this what you want?

fgdfgfthgr-fox avatar Mar 30 '23 23:03 fgdfgfthgr-fox

@Sumanai How did you get Erebus working?

Mar2ck avatar Mar 31 '23 14:03 Mar2ck

@Sumanai How did you get Erebus working?

You can see my dirty edits in my repository. https://github.com/Sumanai/FlexGen/tree/erebus I hope this code will help explorers in adding Galactica support.

Sumanai avatar Apr 05 '23 13:04 Sumanai