worker-vllm There's no "seed" option in the text completions (not chat) which is very important

Hello! I'd like to start by saying we don't use the Chat template (one of the main reasons we don't just use mainstream models and need Runpod in the first place; we don't care about chat), but unfortunately it just seems Runpod's endpoints don't support sending a seed parameter for the RNG which helps making every AI generation request unique while having the same hyperparameters such as temperature and so on.

Currently the only way to set the random number generator's seed is through environment variables (SEED) but that's useless for the purpose mentioned here—ensuring unique generation across requests.

I simply ask to add support for passing a seed for the random number generator in the input->sampling_parameters->seed maybe, as it is very important for us and surely a lot of people, not only for reproduction and deterministic generation but also to make the model not repeat itself for end users in production workloads by giving devs the control of what seed is used. Thank you.

Jan 03 '25 05:01 varkarrus0

@michaelinva Do you mean in OpenAI completion API? We set it via ENV because it can be set during engine initialisation,

Jan 03 '25 05:01 pandyamarut

@pandyamarut No. We don't use the OpenAI-compatible API, but the one involving prompt (a stream of raw text to be completed by the model). Basically, we need to be able to set the seed inside the request itself (and I currently see no way of doing that), so that we can send a different seed for every request even if the prompt is the same, thus providing unique outputs that are different for each user and/or every regeneration.

A workaround would be to randomly change hyperparameters (temperature, frequency_penalty, and so on)—but that could heavily ruin performance and make the experience very unstable. The seed of the RNG is much cleaner, and many API providers already support setting it, so I don't see the problem supporting it here too in Serverless Runpod.

Hope it is now clear that setting the seed in the engine's environment for all requests is not ideal at all; we need an actual way of setting it per request—and thanks again for the quick response! :)

Jan 03 '25 07:01 varkarrus0

Make sense. Let me work on that. Thanks for the feedback. @michaelinva

Jan 03 '25 07:01 pandyamarut