h2ogpt
h2ogpt copied to clipboard
very slow inference
Hello,
First, thank you very much for providing us with the h2o gpt models. I am currently using h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
version, but it seems that the inference is taking so much time, it makes it impractical to use it.
Code for reproducibility:
import torch
from transformers import AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained(
"h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3",
use_fast=False,
padding_side="left",
trust_remote_code=True,
)
generate_text = pipeline(
model="h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3",
tokenizer=tokenizer,
torch_dtype=torch.float16,
trust_remote_code=True,
use_fast=False,
device_map={"": "cuda:0"},
)
# Inference
# ----------------------
from time import time
start = time()
res = generate_text(
"Why is drinking water so healthy?",
min_new_tokens=2,
max_new_tokens=1024,
do_sample=False,
num_beams=1,
temperature=float(0.3),
repetition_penalty=float(1.2),
renormalize_logits=True
)
end = time()
print('Cumputing time:', end-start)
print(res[0]["generated_text"])
The output of the above code is:
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Cumputing time: 50.18443560600281
Drinking water is essential for life and health. It helps regulate body temperature, lubricates joints, flushes toxins from the body, and keeps skin hydrated.
Water also plays a role in digestion, helping to break down food and move it through the digestive system. Drinking enough water can help prevent constipation and other digestive issues.
In addition, water helps maintain proper blood volume and pressure, which are important for cardiovascular health.
Overall, drinking water is an easy way to improve overall health and well-being.
My GPU:
!nvidia-smi
Tue Jul 11 10:34:28 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 43C P0 28W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Am I misusing the model? Thank you