Mattt
Mattt
@charliemday Replicate does support creating up to [6000 concurrent predictions per minute](https://replicate.com/docs/how-does-replicate-work#rate-limits). Depending on how much the model is scaled out, you could process all of them more quickly using...
@charliemday Apologies, yes — the rate limit is 600 / minute. That was a typo on my part.
Hi, @vishnubob. You're correct that Replicate doesn't currently expose any APIs for managing [deployments](https://replicate.com/docs/deployments). However, you can configure your deployment with a min / max number of concurrent predictions to...
Closing the loop on this — Replicate's API now supports creating and modifying deployments. Support for these endpoints was added to the Python client by https://github.com/replicate/replicate-python/pull/258, and is available in...
`replicate.models.predictions.create` has been deprecated in favor of an overload to `replicate.predictions.create` that accepts a `model` kwarg. This is documented in the README in this section: https://github.com/replicate/replicate-python?tab=readme-ov-file#run-a-model-and-stream-its-output
Hey @921kiyo. Thanks for opening up a new issue for this. The way progress is determined is by scanning the prediction logs and scanning for output in a given pattern....
Following up on this — I believe the issue of logs populating for in-flight predictions should now be resolved Please let me know if you're still having issues with prediction...
Hi @wernerulbts. What you're trying to do can be described as "constrained generation" or "function calling". We don't currently support these features with our official deployments of llama 3, but...
@wernerulbts Thanks for sharing that context. Something you might try to solve the immediate `'\n\"Name\"'` error you're seeing is to remove the newlines from your system prompt. If the model...
@wernerulbts So glad to see you got that working! I think what you have there is better than what's described by that blog post, so I'd recommend rolling with that....