handson-ml3 icon indicating copy to clipboard operation
handson-ml3 copied to clipboard

[QUESTION] Creating a Prediction Service on Vertex AI (chapter 19: training_and_deploying_at_scale.ipynb)

Open michabuehlmann opened this issue 1 year ago • 1 comments

I try the following code in the paragraph "Creating a Prediction Service on Vertex AI" in chapter 19. The first cells are running normally. Then comes the code with a bug:

endpoint = aiplatform.Endpoint.create(display_name="michael-mnist-endpoint")

endpoint.deploy(
    mnist_model,
    min_replica_count=1,
    max_replica_count=5,
    machine_type="n1-standard-32",
    #accelerator_type='NVIDIA_TESLA_K80',
    accelerator_type="NVIDIA_TESLA_T4",
    accelerator_count=4
)

I get the following stacktrace:

INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/980799330137/locations/us-central1/endpoints/2848656506683916288/operations/8057826248975450112
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/980799330137/locations/us-central1/endpoints/2848656506683916288
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/980799330137/locations/us-central1/endpoints/2848656506683916288')
INFO:google.cloud.aiplatform.models:Deploying Model projects/980799330137/locations/us-central1/models/8350220166423904256 to Endpoint : projects/980799330137/locations/us-central1/endpoints/2848656506683916288
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/980799330137/locations/us-central1/endpoints/2848656506683916288/operations/2642810647015849984
---------------------------------------------------------------------------
ResourceExhausted                         Traceback (most recent call last)
[<ipython-input-12-edad841ee8d1>](https://mubp1prpov-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240523-060110_RC00_636443439#) in <cell line: 3>()
      1 endpoint = aiplatform.Endpoint.create(display_name="michael-mnist-endpoint")
      2 
----> 3 endpoint.deploy(
      4     mnist_model,
      5     min_replica_count=1,

4 frames
[/usr/local/lib/python3.10/dist-packages/google/api_core/future/polling.py](https://mubp1prpov-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240523-060110_RC00_636443439#) in result(self, timeout, retry, polling)
    259             # pylint: disable=raising-bad-type
    260             # Pylint doesn't recognize that this is valid in this case.
--> 261             raise self._exception
    262 
    263         return self._result

ResourceExhausted: 429 The following quotas are exceeded: CustomModelServingCPUsPerProjectPerRegion,CustomModelServingT4GPUsPerProjectPerRegion 8: The following quotas are exceeded: CustomModelServingCPUsPerProjectPerRegion,CustomModelServingT4GPUsPerProjectPerRegion

I'm not sure how to configure the google cloud.

Versions

  • OS: [MacOSX 14.1.2]
  • Python: [3.11.8]
  • TensorFlow: [2.15.0]
  • Scikit-Learn: [1.4.2]

michabuehlmann avatar Jun 04 '24 15:06 michabuehlmann

Ah yes, the UI isn't very clear. I tried to explain it as best I could in chapter 19, but Google has changed the way it's done. I recommend you check out this documentation page of viewing and managing quotas.

ageron avatar Oct 14 '25 22:10 ageron