Xuan-Phi Nguyen

Results 25 comments of Xuan-Phi Nguyen

I added this quick solution below for llama-hf model. The steps are 1. Load original llama to vllm with `llm = LLM("llama-7b")` ... 2. Load lora states dict `lora_state_dict =...

@nivibilla I don't use ray so I'm not sure. But you need to locate and apply the `reassign_weights` function to the [VLLM LlamaForCausalLM model](https://github.com/vllm-project/vllm/blob/aa39e42c5a8a2359363529571cb553cc30e26d58/vllm/model_executor/models/llama.py#L189) here, wherever it is in Ray.

@zuxinqi Sorry I forgot, transpose here: ```python def transpose(weight, fan_in_fan_out): return weight.T if fan_in_fan_out else weight ```

> > Hey, I tried to do this, but when the model is loaded using Ray it doesn't work. I get this error > > ``` > > --------------------------------------------------------------------------- >...

Hi, Thank you for your interest in the paper. There're few possible reasons. 1. Many dependencies such as bleu calculation (which is not sacrebleu but a bleu with special tokenization...

Hi, sorry I didn't have time to revise the code. I will check it later. In the meantime, can you try using fairseq 0.8.0 and use --user-dir and parse the...

Hi, very sorry we did not have time to clean up the codes. As in shown in the instruction, please follow the configuration `dwnstack_merge2seq_node_iwslt_onvalue_base_upmean_mean_mlesubenc_allcross_hier` to find its implementation in the...

@NielsRogge Thanks. Let me check it out. I thought batched generation require left-padding, unless the 2 of the sample are exactly same # of tokens, because otherwise pad tokens will...

@NielsRogge I have added batched generation with left padding in the latest commit. Try it here: ```python import torch from huggingface_hub import hf_hub_download import requests from PIL import Image from...