text-generation-inference
                                
                                 text-generation-inference copied to clipboard
                                
                                    text-generation-inference copied to clipboard
                            
                            
                            
                        seed not working
System Info
Target: x86_64-unknown-linux-gnu
Cargo version: 1.69.0
Commit sha: 22c4fd07abe9d499cd8eda807f389084773124bd
Docker label: sha-22c4fd0
nvidia-smi:
Mon May 15 09:02:10 2023       
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |                               |                      |               MIG M. |
   |===============================+======================+======================|
   |   0  NVIDIA GeForce ...  On   | 00000000:03:00.0  On |                  N/A |
   | 47%   44C    P8    35W / 350W |   3252MiB / 12288MiB |      0%      Default |
   |                               |                      |                  N/A |
   +-------------------------------+----------------------+----------------------+
                                                                                  
   +-----------------------------------------------------------------------------+
   | Processes:                                                                  |
   |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
   |        ID   ID                                                   Usage      |
   |=============================================================================|
   +-----------------------------------------------------------------------------+
2023-05-15T09:02:10.563882Z  INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, sharded: None, num_shard: None, quantize: None, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
server
model=bigscience/bloom-560m
num_shard=1
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard
Client
from text_generation import Client
from multiprocessing import Pool
from collections import Counter
uri = 'http://localhost:8080'
client = Client(uri)
prompt = "What is Deep Learning? Answer in 1 sentence."
params = {'seed':42, 'max_new_tokens': 30, 'do_sample': True}
def test_fn(test):
    ret = client.generate(prompt, **params)
    ret_generated_text = ret.generated_text
    return ret_generated_text
with Pool(3) as p:
    result = list(p.imap_unordered(test_fn, list(range(10))))
for context, count in Counter(result).items():
    print(count, context)
6  Deep Learning is a specialized class of computing that is used in areas such as remote sensing, archiving, and data mining. Deep learning can be used
4  Deep Learning is a specialized class of computing that is used in areas such as remote sensing, geological survey, artificial intelligence, speech synthesis, and artificial
Expected behavior
I gave the seed value and used the do_sample flag, all 10 should receive the same results.
but I received several results.
when i use 0.6.0 and --max-batch-size 1 result is
10 Deep Learning is defined as a theory that explores the same idea in different ways, so various difficulties exist in it because some can be solved, others
@OlivierDehaene Hi Oliver I tested it on a CPU batch and the result was the same, so I changed the dtype to float32 when using GPU and the batch worked.
How did you change the dtype to float32?
Hmmm what about the TGI with version 1.0.0+? Seems some flags are already sunsetted