Georgii Angeni
Georgii Angeni
@hatwearingdumb I believe you need to check your pages for any leftover links to the items you have recently deleted, remove these links, and then all non-existent objects should disappear
@juney-nvidia Thank you for your reply! As far as I am concerned, ReDrafter is the only implementation with focus not only on latency but throughput as well (at least according...
Another point is that at the moment EAGLE does not support FP8 quantization, and running the model in FP16/BF16 severely increases latency during high load, and therefore is an inviable...
Also, could you please tell whether the support matrix for ReDrafter is accurate? It states that just like Medusa it supports FP8 weights for the base model, but it is...
I have also inferred a proprietary model with the Llama architecture built using the same commands ``` python3 examples/redrafter/convert_checkpoint.py --model_dir $HF_MODEL_DIR --drafter_model_dir $DRAFTER_DIR --tp_size 1 --dtype float16 --redrafter_num_beams 1 --redrafter_draft_len_per_beam...
@1ytic Yes, indeed it is! I will try inferring the model with your modification promptly UPD: It works!
@yubofredwang Could you please point out which PR are you referring to if there is one?