h2ogpt icon indicating copy to clipboard operation
h2ogpt copied to clipboard

very slow inference

Open DavidHarar opened this issue 1 year ago • 0 comments

Hello, First, thank you very much for providing us with the h2o gpt models. I am currently using h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 version, but it seems that the inference is taking so much time, it makes it impractical to use it.

Code for reproducibility:

import torch
from transformers import AutoTokenizer, pipeline


tokenizer = AutoTokenizer.from_pretrained(
    "h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3",
    use_fast=False,
    padding_side="left",
    trust_remote_code=True,
)

generate_text = pipeline(
    model="h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3",
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    use_fast=False,
    device_map={"": "cuda:0"},
)

# Inference
# ----------------------
from time import time

start = time()
res = generate_text(
    "Why is drinking water so healthy?",
    min_new_tokens=2,
    max_new_tokens=1024,
    do_sample=False,
    num_beams=1,
    temperature=float(0.3),
    repetition_penalty=float(1.2),
    renormalize_logits=True
)
end = time()
print('Cumputing time:', end-start)
print(res[0]["generated_text"])

The output of the above code is:

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Cumputing time: 50.18443560600281

Drinking water is essential for life and health. It helps regulate body temperature, lubricates joints, flushes toxins from the body, and keeps skin hydrated.

Water also plays a role in digestion, helping to break down food and move it through the digestive system. Drinking enough water can help prevent constipation and other digestive issues.

In addition, water helps maintain proper blood volume and pressure, which are important for cardiovascular health.

Overall, drinking water is an easy way to improve overall health and well-being.

My GPU:

!nvidia-smi

Tue Jul 11 10:34:28 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   43C    P0    28W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Am I misusing the model? Thank you

DavidHarar avatar Jul 11 '23 10:07 DavidHarar