Sinan
Sinan
Connected by ('192.168.178.21', 51970) refused When I export http_proxy=http://127.0.0.1:8889/ I can not connect to the proxy, it refuses connection...
Hi, thanks for the cool project. Does the reverse connection also forward internet traffic? I mean when connected to the server can I also access the internet connection of the...
Hi, what is necessary to implement Fuyu-8B support? https://huggingface.co/adept/fuyu-8b Thank you
Hi! Is it possible to finetune with quantization in mind? https://www.tensorflow.org/model_optimization/guide/quantization/training This way one could hopefully eliminate quantization errors even further
Hey, I would like to train a student model from my teacher model (knowledge distillation for specualtive decoding). Commonly, the student model is being trained on the teachers logits (soft...
Hi! I would like to use QLora to "pretrain" a model and wanted to ask if that is possible, in the release time of qlora I've heard something about a...
Hi ```py from socketengine import host h = host() h.start() while True: data = h.get_ALL("test") if data is not None: for item in data: print(item) if(item == "Hello there!"): print("Sent")...
Hello! I wanted to test the int8 performance benefit, but ran into this error (CUDA and pytorch 12.1): `python3 generate.py --quantize llm.int8 --prompt "Hello, my name is"` -> ```sh Loading...
Hello! As I asked on the Discord, here is the issue on implementing NVIDIA's Transformer Engine with compute capability 9 (H100 GPU). I would really love to see and help...
Hey! This is the correct LLama 2 Chat prompt formatting implementation into `example_llama2chat.py`. This PR uses https://github.com/turboderp/exllama/pull/195 to copy the exact implementation of the [original Llama repo](https://github.com/facebookresearch/llama/) The format for...