LaaZa comments

Results 113 comments of


                                            LaaZa

Comprehensive benchmarking of AutoGPTQ; Triton vs CUDA; vs old 'ooba' GfL

And here I am getting around 4.0s using facebook/opt-125m with RTX 3080. Something is really weird.

Comprehensive benchmarking of AutoGPTQ; Triton vs CUDA; vs old 'ooba' GfL

It can't be the package versions or cuda versions(maybe differences but not the major one) they have not correlated on bloke's testing. I have similar bad performance on the Windows...

Comprehensive benchmarking of AutoGPTQ; Triton vs CUDA; vs old 'ooba' GfL

No that is for opt-125m. The exact script from [above](https://github.com/PanQiWei/AutoGPTQ/issues/49#issuecomment-1539093822)

Comprehensive benchmarking of AutoGPTQ; Triton vs CUDA; vs old 'ooba' GfL

I mean the WSL is pretty much a clean install. Though the cuda comes from the Windows drivers. I have not clean installed those, but I think I did specifically...

GPTQ support for MPT models

I tested implementing this but ran into some issues that I'm not really sure why they are happening. I can perhaps make a draft PR.

GPTQ support for MPT models

> > I tested implementing this but ran into some issues that I'm not really sure why they are happening. I can perhaps make a draft PR. > > Did...

GPTQ support for MPT models

> @PanQiWei Old issue but would love to see some updated support for models like MPT and Falcon. > > MPT keeps pumping out new models, we got [30B-8k](https://huggingface.co/mosaicml/mpt-30b) and...

[WIP] [WORKING] dbrx (mod) support

I don't think norm_1 and 2 or router helps here. The MoE achitechture used is very unusual. https://huggingface.co/databricks/dbrx-instruct/discussions/10

[WIP] [WORKING] dbrx (mod) support

You need to have index for the experts if you use the modified model. For now, you need to duplicate the mlp lines for each from 0-15. Also the correct...

[WIP] [WORKING] dbrx (mod) support

Well in general I would skip anything with wrong shape. Normalization modules are usually skipped and in this model they have the shape [6144] so it's one dimensional and we...