feat: Update files with respect to common ReplicaSpec refactor
Tracking issue
Resolves: flyteorg/flyte#4408
Why are the changes needed?
https://github.com/flyteorg/flyte/pull/5355 changes protobuf files, so we need to update the corresponding files in flytekit.
What changes were proposed in this pull request?
Update files with respect to common ReplicaSpec refactor.
How was this patch tested?
Setup process
In flyte repo
- Checkout https://github.com/flyteorg/flyte/pull/5355
make compileflytectl demo start --devkubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.7.0"POD_NAMESPACE=flyte ./flyte start --config kubeflow.yaml
where kubeflow.yaml is
# This is a sample configuration file for running single-binary Flyte locally against
# a sandbox.
admin:
# This endpoint is used by flytepropeller to talk to admin
# and artifacts to talk to admin,
# and _also_, admin to talk to artifacts
endpoint: localhost:30080
insecure: true
catalog-cache:
endpoint: localhost:8081
insecure: true
type: datacatalog
cluster_resources:
standaloneDeployment: false
templatePath: $HOME/.flyte/sandbox/cluster-resource-templates
logger:
show-source: true
level: 5
propeller:
create-flyteworkflow-crd: true
kube-config: $HOME/.flyte/sandbox/kubeconfig
rawoutput-prefix: s3://my-s3-bucket/data
server:
kube-config: $HOME/.flyte/sandbox/kubeconfig
webhook:
certDir: $HOME/.flyte/webhook-certs
localCert: true
secretName: flyte-sandbox-webhook-secret
serviceName: flyte-sandbox-local
servicePort: 9443
tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- K8S-ARRAY
#- pytorch
- tensorflow
#- mpi
default-for-task-types:
- container: container
- container_array: K8S-ARRAY
- sidecar: sidecar
#- pytorch: pytorch
- tensorflow: tensorflow
#- mpi: mpi
fallback-to-container-handler: false
plugins:
logs:
kubernetes-enabled: true
kubernetes-template-uri: http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
k8s:
image-pull-policy: Always
default-env-vars:
- FLYTE_AWS_ENDPOINT: http://flyte-sandbox-minio.flyte:9000
- FLYTE_AWS_ACCESS_KEY_ID: minio
- FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
k8s-array:
logs:
config:
kubernetes-enabled: true
kubernetes-template-uri: http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
database:
postgres:
username: postgres
password: postgres
host: 127.0.0.1
port: 30001
dbname: flyte
options: "sslmode=disable"
storage:
type: stow
stow:
kind: s3
config:
region: us-east-1
disable_ssl: true
v2_signing: true
endpoint: http://localhost:30002
auth_type: accesskey
access_key_id: minio
secret_key: miniostorage
container: my-s3-bucket
task_resources:
defaults:
cpu: 2
memory: 1Gi
limits:
cpu: 4
memory: 4Gi
In the parent folder of flyte and flytekit repo
- Create
Dockerfile
FROM python:3.11-slim-bookworm as builder
WORKDIR /root
ENV PYTHONPATH /root
# Install build dependencies
RUN apt update \
&& apt install build-essential git wget -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Copy necessary directories
COPY flyte /flyte
COPY flytekit /flytekit
# Install Python packages (Order is important!)
RUN pip install --no-cache-dir /flytekit/plugins/flytekit-kf-tensorflow \
&& pip install --no-cache-dir /flytekit \
&& pip install --no-cache-dir /flyte/flyteidl
- Run
docker buildx build -t localhost:30000/flytekit:dev --file Dockerfile --push .
In an arbitrary folder
- Create
kubeflow_tf_evaluator.py
from flytekitplugins.kftensorflow import PS, Chief, Evaluator, TfJob, Worker
from flytekit import Resources, task
task_config = TfJob(
worker=Worker(replicas=2),
chief=Chief(replicas=1),
ps=PS(replicas=1),
evaluator=Evaluator(replicas=1),
)
@task(
task_config=task_config,
requests=Resources(cpu="1"),
)
def my_tensorflow_task(x: int, y: str) -> str:
return f"{x=}, {y=}"
- Run
pyflyte run --remote --image localhost:30000/flytekit:dev kubeflow_tf_evaluator.py my_tensorflow_task --x 100 --y acc
Test backward compatibility
- Checkout to
masterbranch inflytekitrepo. - Rebuild docker image with
docker buildx build -t localhost:30000/flytekit:dev --file Dockerfile --push .(In the parent folder offlyteandflytekitfolder) - Run
pyflyte run --remote --image localhost:30000/flytekit:dev kubeflow_tf_evaluator.py my_tensorflow_task --x 100 --y acc
Screenshots
Note that the worker replica is 2.
Check all the applicable boxes
- [x] I updated the documentation accordingly.
- [x] All new and existing tests passed.
- [x] All commits are signed-off.
Related PRs
https://github.com/flyteorg/flyte/pull/5355
Docs link
This is a PR for you to test evaluator. https://github.com/flyteorg/flytekit/pull/1870
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 58.31%. Comparing base (
69445ff) to head (cd6232a). Report is 4 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #2424 +/- ##
===========================================
- Coverage 79.24% 58.31% -20.94%
===========================================
Files 196 250 +54
Lines 19785 22092 +2307
Branches 4008 4006 -2
===========================================
- Hits 15678 12882 -2796
- Misses 3407 8700 +5293
+ Partials 700 510 -190
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@MortalHappiness , can you merge master to get rid of the dbt failures?
In order to fix the CI failures for kf-mpi and kf-tensorflow you'll need to add a line similar to this to https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-kf-pytorch/dev-requirements.in and create the corresponding dev-requirements.in in https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-kf-tensorflow.