text-generation-inference
text-generation-inference copied to clipboard
seed not working
System Info
Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: 22c4fd07abe9d499cd8eda807f389084773124bd Docker label: sha-22c4fd0 nvidia-smi: Mon May 15 09:02:10 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:03:00.0 On | N/A | | 47% 44C P8 35W / 350W | 3252MiB / 12288MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+ 2023-05-15T09:02:10.563882Z INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, sharded: None, num_shard: None, quantize: None, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
server
model=bigscience/bloom-560m
num_shard=1
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard
Client
from text_generation import Client
from multiprocessing import Pool
from collections import Counter
uri = 'http://localhost:8080'
client = Client(uri)
prompt = "What is Deep Learning? Answer in 1 sentence."
params = {'seed':42, 'max_new_tokens': 30, 'do_sample': True}
def test_fn(test):
ret = client.generate(prompt, **params)
ret_generated_text = ret.generated_text
return ret_generated_text
with Pool(3) as p:
result = list(p.imap_unordered(test_fn, list(range(10))))
for context, count in Counter(result).items():
print(count, context)
6 Deep Learning is a specialized class of computing that is used in areas such as remote sensing, archiving, and data mining. Deep learning can be used
4 Deep Learning is a specialized class of computing that is used in areas such as remote sensing, geological survey, artificial intelligence, speech synthesis, and artificial
Expected behavior
I gave the seed
value and used the do_sample
flag, all 10 should receive the same results.
but I received several results.
when i use 0.6.0 and --max-batch-size 1
result is
10 Deep Learning is defined as a theory that explores the same idea in different ways, so various difficulties exist in it because some can be solved, others
@OlivierDehaene Hi Oliver I tested it on a CPU batch and the result was the same, so I changed the dtype to float32 when using GPU and the batch worked.
How did you change the dtype to float32?
Hmmm what about the TGI with version 1.0.0+? Seems some flags are already sunsetted