replicate-python Predictions often fail on meta/llama-2-70b

Calls to meta/llama-2-70b are sometimes succeeding, but sometimes failing. It is very unreliable.

This is the code

output = replicate.run(
        "meta/llama-2-70b",
        input={
            "prompt": "Q: Would a pear sink in water? A: Let's think step by step. ",
            "max_new_tokens": 10000,
            "temperature": 0.01,
        }
    )

Example failure: https://replicate.com/p/x3brrjtbwq4ky6zm2z2ay27amy Example failure: https://replicate.com/p/72pdpvtby7l7wgdzrpzzldqzne Example failure: https://replicate.com/p/ucbimbtbhyjrw5udypzf6srsm4 Example success: https://replicate.com/p/n6hg2cdbym5ksifmlm6yfahjzm Example success: https://replicate.com/p/j2jkwn3bfl6w2wc4q53mwju2o4 Example success: https://replicate.com/p/mtb6jcrbzjh57s2wjzfycegxoa

Mar 15 '24 16:03 jdkanu

Hi @jdkanu. Thank you for reporting this. Looking at our telemetry, it does seem like predictions on GPUs in certain regions are failing more often due to read timeouts. We're investigating the cause, and working on a remediation.

Mar 15 '24 16:03 mattt

Thank you

Mar 15 '24 17:03 jdkanu