Iman Hosseini comments

Results 13 comments of


                                            Iman Hosseini

INT8 Support for GPT models

[GLM-130B: An Open Bilingual Pre-trained Model](https://arxiv.org/abs/2210.02414) New results on applying quantization to GPT models. Do you have any suggestions -or say, a blueprint- on how to adapt the current GPT-J...

NOT FOUND errors

Internet connection was ok (I'd just cloned the repo), but since then I unplugged it _unsafely_ and now it doesn't show up in 'mdt devices' anymore. The light goes green,...

NOT FOUND errors

I reflashed and fixed.

add tuning

Ok I hate git. How can I squash the commits? I'm doing as this says: https://stackoverflow.com/questions/5189560/how-do-i-squash-my-last-n-commits-together

Gazebo on Windows: conda-forge installation

I use conda command prompt, and I get stuck on "[Msg] Waiting for master." weirdly, it randomly works sometimes [everything works] but most of the time it just gets stuck...

Gazebo on Windows: conda-forge installation

win64. conda info: ``` active environment : r2 active env location : C:\Users\salva\miniconda3\envs\r2 shell level : 2 user config file : C:\Users\salva\.condarc populated config files : C:\Users\salva\.condarc conda version :...

INT8 Support for GPT models

A question regarding INT8 support: Q1. Almost all templated classes (e.g. https://github.com/NVIDIA/FasterTransformer/blob/6fddeac5f59ce4df380002aa945da57a0c8e878c/src/fastertransformer/models/gpt/GptDecoderLayerWeight.cc#L201) only support float or half: ``` template struct GptDecoderLayerWeight; template struct GptDecoderLayerWeight; ``` Assuming one wants to implement...

Bloom code

Not sure about how they did it, but this change: https://github.com/Guangxuan-Xiao/torch-int/pull/1/commits/2163a169748edff67586c2bf0158f4c7f0718fc6 includes an implementation for Gelu unit.

Can WinPmem dump gpu memory?

Thank you. Do you know of any open source tool which does that or any pointers on where to start making such a tool?

Benchmark blockgather (method 4)

For M = 2000000, (on RTX 3060 Ti): ``` # for different n1/n2/n3 and m \in {1,2,4} ./cuperftest --N1=n1 --N2=n2 --N3=n3 --method=m ``` ![plot_2M](https://github.com/flatironinstitute/finufft/assets/12172889/57267571-6a5b-4460-be05-3af388854f30) m=4 is much slower. 'spread_3d_block_gather' achieves...