text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

seed not working

Open dongs0104 opened this issue 1 year ago • 2 comments

System Info

Target: x86_64-unknown-linux-gnu
Cargo version: 1.69.0
Commit sha: 22c4fd07abe9d499cd8eda807f389084773124bd
Docker label: sha-22c4fd0
nvidia-smi:
Mon May 15 09:02:10 2023       
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |                               |                      |               MIG M. |
   |===============================+======================+======================|
   |   0  NVIDIA GeForce ...  On   | 00000000:03:00.0  On |                  N/A |
   | 47%   44C    P8    35W / 350W |   3252MiB / 12288MiB |      0%      Default |
   |                               |                      |                  N/A |
   +-------------------------------+----------------------+----------------------+
                                                                                  
   +-----------------------------------------------------------------------------+
   | Processes:                                                                  |
   |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
   |        ID   ID                                                   Usage      |
   |=============================================================================|
   +-----------------------------------------------------------------------------+
2023-05-15T09:02:10.563882Z  INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, sharded: None, num_shard: None, quantize: None, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }

Information

  • [X] Docker
  • [ ] The CLI directly

Tasks

  • [X] An officially supported command
  • [ ] My own modifications

Reproduction

server

model=bigscience/bloom-560m
num_shard=1
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard

Client

from text_generation import Client
from multiprocessing import Pool
from collections import Counter
uri = 'http://localhost:8080'
client = Client(uri)
prompt = "What is Deep Learning? Answer in 1 sentence."
params = {'seed':42, 'max_new_tokens': 30, 'do_sample': True}

def test_fn(test):
    ret = client.generate(prompt, **params)
    ret_generated_text = ret.generated_text
    return ret_generated_text

with Pool(3) as p:
    result = list(p.imap_unordered(test_fn, list(range(10))))

for context, count in Counter(result).items():
    print(count, context)
6  Deep Learning is a specialized class of computing that is used in areas such as remote sensing, archiving, and data mining. Deep learning can be used
4  Deep Learning is a specialized class of computing that is used in areas such as remote sensing, geological survey, artificial intelligence, speech synthesis, and artificial

Expected behavior

I gave the seed value and used the do_sample flag, all 10 should receive the same results. but I received several results.

when i use 0.6.0 and --max-batch-size 1 result is

10 Deep Learning is defined as a theory that explores the same idea in different ways, so various difficulties exist in it because some can be solved, others

dongs0104 avatar May 15 '23 09:05 dongs0104

@OlivierDehaene Hi Oliver I tested it on a CPU batch and the result was the same, so I changed the dtype to float32 when using GPU and the batch worked.

dongs0104 avatar May 17 '23 15:05 dongs0104

How did you change the dtype to float32?

jshin49 avatar Jun 14 '23 08:06 jshin49

Hmmm what about the TGI with version 1.0.0+? Seems some flags are already sunsetted

muhammad-asn avatar Sep 19 '23 16:09 muhammad-asn