serve icon indicating copy to clipboard operation
serve copied to clipboard

[WIP] Include GPT Fast in torch.compile nightly benchmark workflow

Open sachanub opened this issue 1 year ago • 1 comments

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

The objective of this PR is to include the GPT Fast model with weights corresponding to Llama 7B with int4 quantization.

Steps to download Llama 7B weights in the benchmark host:

Ran a temporary workflow to download weights with the HUGGING_FACE_HUB_TOKEN in the commit 1e6088e

Results of successful run: https://github.com/pytorch/serve/actions/runs/7271384883/job/19811851851?pr=2857

Testing:

Ran benchmark workflow in the commit 31936c9

Results of the successful run: https://github.com/pytorch/serve/actions/runs/7272224847/job/19813999840?pr=2857 Benchmark report file: report.md

Updates in benchmark-ab.py script:

Also updated the benchmark-ab.py script to include -l in the ab commands to allow variable response lengths without counting them as errors (https://httpd.apache.org/docs/2.4/programs/ab.html).

sachanub avatar Dec 20 '23 01:12 sachanub

@namannandan @lxning What is the work remaining for this PR?

chauhang avatar Apr 14 '24 09:04 chauhang