Predictions often fail on meta/llama-2-70b
Calls to meta/llama-2-70b are sometimes succeeding, but sometimes failing. It is very unreliable.
This is the code
output = replicate.run(
"meta/llama-2-70b",
input={
"prompt": "Q: Would a pear sink in water? A: Let's think step by step. ",
"max_new_tokens": 10000,
"temperature": 0.01,
}
)
Example failure: https://replicate.com/p/x3brrjtbwq4ky6zm2z2ay27amy Example failure: https://replicate.com/p/72pdpvtby7l7wgdzrpzzldqzne Example failure: https://replicate.com/p/ucbimbtbhyjrw5udypzf6srsm4 Example success: https://replicate.com/p/n6hg2cdbym5ksifmlm6yfahjzm Example success: https://replicate.com/p/j2jkwn3bfl6w2wc4q53mwju2o4 Example success: https://replicate.com/p/mtb6jcrbzjh57s2wjzfycegxoa
Hi @jdkanu. Thank you for reporting this. Looking at our telemetry, it does seem like predictions on GPUs in certain regions are failing more often due to read timeouts. We're investigating the cause, and working on a remediation.
Thank you