python-docs-samples Add Gemma Flex Template example

Description

Adds a Gemma Flex Template example and an e2e test running on Dataflow. This code example is similar to #11284, but using a Pytorch model and deploying as a flex template. The e2e test will need model weights staged to GCS like the streaming Gemma example.

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Checklist

[ ] I have followed Sample Guidelines from AUTHORING_GUIDE.MD
[ ] README is updated to include all relevant information
[ ] Tests pass: nox -s py-3.9 (see Test Environment Setup)
[ ] Lint pass: nox -s lint (see Test Environment Setup)
[ ] These samples need a new API enabled in testing projects to pass (let us know which ones)
[ ] These samples need a new/updated env vars in testing projects set to pass (let us know which ones)
[ ] This sample adds a new sample directory, and I updated the CODEOWNERS file with the codeowners for this sample
[ ] This sample adds a new Product API, and I updated the Blunderbuss issue/PR auto-assigner with the codeowners for this sample
[ ] Please merge this PR for me once it is approved

Jun 20 '24 18:06 jrmccluskey

Not sure what happened on the kokoro test run, the test target passed but the test execution as a whole was killed right after the test passed

Jun 21 '24 17:06 jrmccluskey

Not sure what's happening here, test passes but the test session gets killed consistently. @engelke can you take a look?

Jun 25 '24 15:06 jrmccluskey

@kweinmeister Still need an approving review from someone if you could take a look

Jun 26 '24 19:06 jrmccluskey

If the complaint is with the model handler code I don't think it's too much of a change to cut that code in favor of linking to the source instead.

Jun 27 '24 14:06 jrmccluskey

Debugging the tests: the output shows it's a timeout, but the tests are successful(?)

collecting ... collected 1 item

e2e_test.py::test_pipeline_dataflow PASSED                               [100%]

-- generated xml file: /workspace/dataflow/gemma-flex-template/sponge_log.xml --
======================== 1 passed in 3642.73s (1:00:42) ========================
nox > Session py-3.10 was successful.


err: signal: killed

The kokoro config is set to a max of 60 min (config). And you've configured the test to have a 5400s (2 hour) timeout.

A similar issue reported in https://github.com/GoogleCloudPlatform/python-docs-samples/issues/4609.

At a guess, while the image is created each time (~20 mins) and it takes time for the job to start (~20 mins), the success message isn't being received, and this the system has 20 minutes of wait before it times out.

How long is this entire e2e test expected to be, and is the 2h wait there intentionally? Something else will need to be updated for that decorator to be respected.

Jun 28 '24 03:06 glasnt

As far as the E2E test timeout, in early testing we were dancing around the hour-mark as far as runs (some slightly under, some slightly over) so it definitely needs to be over an hour. Building the container + running the job as an invocation from a flex template takes substantial time, so we may need a little longer on the kokoro timeout

Jun 28 '24 16:06 jrmccluskey

Building the container + running the job as an invocation from a flex template takes substantial time.

You can significantly bring this down by not including the model and the GPU software into the flex template image. This is a scenario where having two separate images, one for the flex template launcher and one for the custom container image would be better. Care should be taken to build the images with the same set of dependencies, which can be accomplished with requirements files and/or constraint files.

Jun 28 '24 23:06 tvalentyn

Also, we can speed up launch time by download the model into the SDK worker container from GCS during container startup, instead of shipping it inside the contanier. Currently, this could be done by using a custom entrypoint like https://github.com/liferoad/beamllm/blob/main/containers/ollama/entrypoint.sh, eventually we will have a Beam API for that.

Jun 28 '24 23:06 tvalentyn

including the model in the container will be less prone to a runtime error, but slower in short-term future.

Jun 28 '24 23:06 tvalentyn

What are the next steps on this PR?

Jul 08 '24 22:07 tvalentyn

@jrmccluskey are you still working on changes or you are waiting for a review (if so, please say explicitly that you have addressed previous comments, since current PR status seems to be Changes requested).

I see that tests are still failing -- did we figure out how to increase the time limits?

Jul 10 '24 17:07 tvalentyn

To be clear, comments have been addressed and I am waiting for a review. As far as the timeout, the kokoro config linked above can be updated to run longer; however, I was holding off on updating that since it's a repo-wide timeout. I suppose I should go ahead and update that just for the sake of having a green run on the PR.

Jul 10 '24 17:07 jrmccluskey

@glasnt it appears you might have to submit a CLA for the github user id you've been using here, see: https://github.com/GoogleCloudPlatform/python-docs-samples/pull/11881/checks?check_run_id=27283317137

Jul 10 '24 17:07 tvalentyn

@glasnt it appears you might have to submit a CLA for the github user id you've been using here, see: https://github.com/GoogleCloudPlatform/python-docs-samples/pull/11881/checks?check_run_id=27283317137

Resolved.

Jul 10 '24 21:07 glasnt

The Python 3.10 CI checks might have succeeded, apart from cleanup:

Job has terminated in state FAILED: Workflow job: 2024-07-10_11_09_14-9934820475476190981 failed. Please ensure you have permission to access the job and the `--region` flag, us-central1, matches the job's region.

https://btx.cloud.google.com/invocations/0be77b16-c664-4214-b5bf-b70c2588330c/log

Jul 11 '24 02:07 glasnt

Same problem we've seen with some runs here:

Not too much of a surprise, if every test ran at around the same time quota would be tight

Jul 11 '24 14:07 jrmccluskey

After looking at the system logs for the Dataflow workers in one of the earlier runs, it looks like the workers don't have enough disk space to load the container image and model.

Handler for GET /v1.41/images/get returned error: write /var/lib/docker/tmp/docker-export-3281602760/0e6537f85f3ebad7a4b5af8385d234950c2861657142f4f53123b65c153127fe/layer.tar: no space left on device

GPU images are large, and the model is several GB as well. Everything needs to fit into disk. Also from previous experiments with LLMs in Dataflow, each worker might need to have space for an additional copy of the model weights on disk, depending on the way the model is loaded.

It keeps crashing and retrying infinitely due to "no space left on device" until we reach the test timeout.

Try increasing the worker machines disk space with --disk_size_gb.

Jul 12 '24 00:07 davidcavazos

Startup of the worker pool in zone us-central1-a failed to bring up any of the desired 1 workers. Please refer to https://cloud.google.com/dataflow/docs/guides/common-errors#worker-pool-failure for help troubleshooting. ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS: Instance 'dataflow-gemma-flex-templ-07171029-mk9a-harness-7qn5' creation failed: The zone 'projects/python-docs-samples-tests/zones/us-central1-a' does not have enough resources available to fulfill the request. '(resource type:compute)'.

Looks like quota issue?

Jul 18 '24 20:07 davidcavazos

What are next steps here? This PR has been hanging for a while

Jul 29 '24 13:07 jrmccluskey

We need the tests to pass in order to merge.

Jul 29 '24 18:07 davidcavazos

Is it still quota issues or something else?

Aug 01 '24 01:08 tvalentyn

=================================== FAILURES ===================================
____________________________ test_pipeline_dataflow ____________________________
Traceback (most recent call last):
  File "/workspace/dataflow/gemma-flex-template/.nox/py-3-10/lib/python3.10/site-packages/google/api_core/retry/retry_unary.py", line 144, in retry_target
    result = target()
  File "/workspace/dataflow/conftest.py", line 167, in pubsub_wait_for_messages
    messages = [m.message.data.decode("utf-8") for m in response.received_messages]
  File "/workspace/dataflow/conftest.py", line 167, in 
    messages = [m.message.data.decode("utf-8") for m in response.received_messages]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 3: invalid start byte

Aug 01 '24 01:08 tvalentyn

I've tried some experiments on the region changing, and the decode error. These may need to be updated/reverted etc.

Aug 09 '24 09:08 glasnt

This should now run green with the encoding issues resolved. Since I worked so much on this one, I'd want someone !me to approve this PR

Aug 14 '24 02:08 glasnt

Sure enough, actually green! Thank you for the debugging work @glasnt !

Aug 14 '24 13:08 jrmccluskey

I made some code simplifications. I think the issue with the byte64 was that Pub/Sub expects bytes and it was being passed a string. The local runner failed, but I suspect the Dataflow runner was implicitly converting it into base64. I made some changes to try to simplify the code a bit, and hopefully the tests stay green.

Aug 14 '24 22:08 davidcavazos

Looks like the workflow was running during the github outage last night, failed on the git clone step

Aug 15 '24 16:08 jrmccluskey

LGTM, tests are passing but take 50+ minutes to run. We can merge, but it would be nice to optimize this further.

A majority of this is the image build, blocking on network; but as long as it's <60m, we're ok.

Aug 16 '24 00:08 glasnt

python-docs-samples python-docs-samples copied to clipboard

Add Gemma Flex Template example

Description

Checklist

python-docs-samples
python-docs-samples copied to clipboard