Chansung Park
Chansung Park
sorry about this guys. I was granted free GPU for 7B and 13B by HuggingFace team, but that was withdrawn today
Looks like it is not allowed to deploy LLaMA in any kind of form https://huggingface.co/spaces/chansung/LLaMA-7B/discussions/5
Yeah But I didnt expose the hard link of the weights. So it looks like the weights should be used soley for personal purpose
any tips to speed up the inference speed?
it runs the model shared by @tloen on 7 core CPU / 32GB with a single RTX5000. I am hosting it in jarvislabs.ai
padding_side="left" do the trick
@benob You could do something like below ```python def evaluate(instructions, input=None): prompts = [generate_prompt(instructions) for instruction in instructions] encodings = tokenizer(prompts, return_tensors="pt", padding=True).to('cuda') # input_ids = inputs["input_ids"].cuda() generation_outputs = model.generate(...
created the gradio app for this. https://github.com/deep-diver/Alpaca-LoRA-Serve
Bigger model, more data with much better quality than now
more example with 13B at this time