Sinan issues

Results 100 issues of


                                            Sinan

main.py http proxy refuses connection

Connected by ('192.168.178.21', 51970) refused When I export http_proxy=http://127.0.0.1:8889/ I can not connect to the proxy, it refuses connection...

Only local network?

Hi, thanks for the cool project. Does the reverse connection also forward internet traffic? I mean when connected to the server can I also access the internet connection of the...

Fuyu-8B qLora

Hi, what is necessary to implement Fuyu-8B support? https://huggingface.co/adept/fuyu-8b Thank you

Quantization aware finetuning?

Hi! Is it possible to finetune with quantization in mind? https://www.tensorflow.org/model_optimization/guide/quantization/training This way one could hopefully eliminate quantization errors even further

Training on logits rather than tokens?

Hey, I would like to train a student model from my teacher model (knowledge distillation for specualtive decoding). Commonly, the student model is being trained on the teachers logits (soft...

How to pretrain "raw" text?

Hi! I would like to use QLora to "pretrain" a model and wanted to ask if that is possible, in the release time of qlora I've heard something about a...

Data stuck in loop

Hi ```py from socketengine import host h = host() h.start() while True: data = h.get_ALL("test") if data is not None: for item in data: print(item) if(item == "Hello there!"): print("Sent")...

cublasLt runs into an error on 8 bit quantized

Hello! I wanted to test the int8 performance benefit, but ran into this error (CUDA and pytorch 12.1): `python3 generate.py --quantize llm.int8 --prompt "Hello, my name is"` -> ```sh Loading...

H100 Transformer Engine implementation

Hello! As I asked on the Discord, here is the issue on implementing NVIDIA's Transformer Engine with compute capability 9 (H100 GPU). I would really love to see and help...

Llama 2 Chat implementation

Hey! This is the correct LLama 2 Chat prompt formatting implementation into `example_llama2chat.py`. This PR uses https://github.com/turboderp/exllama/pull/195 to copy the exact implementation of the [original Llama repo](https://github.com/facebookresearch/llama/) The format for...