flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

feat: Update files with respect to common ReplicaSpec refactor

Open MortalHappiness opened this issue 1 year ago • 3 comments

Tracking issue

Resolves: flyteorg/flyte#4408

Why are the changes needed?

https://github.com/flyteorg/flyte/pull/5355 changes protobuf files, so we need to update the corresponding files in flytekit.

What changes were proposed in this pull request?

Update files with respect to common ReplicaSpec refactor.

How was this patch tested?

Setup process

In flyte repo

  1. Checkout https://github.com/flyteorg/flyte/pull/5355
  2. make compile
  3. flytectl demo start --dev
  4. kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.7.0"
  5. POD_NAMESPACE=flyte ./flyte start --config kubeflow.yaml

where kubeflow.yaml is

# This is a sample configuration file for running single-binary Flyte locally against
# a sandbox.
admin:
  # This endpoint is used by flytepropeller to talk to admin
  # and artifacts to talk to admin,
  # and _also_, admin to talk to artifacts
  endpoint: localhost:30080
  insecure: true

catalog-cache:
  endpoint: localhost:8081
  insecure: true
  type: datacatalog

cluster_resources:
  standaloneDeployment: false
  templatePath: $HOME/.flyte/sandbox/cluster-resource-templates

logger:
  show-source: true
  level: 5

propeller:
  create-flyteworkflow-crd: true
  kube-config: $HOME/.flyte/sandbox/kubeconfig
  rawoutput-prefix: s3://my-s3-bucket/data

server:
  kube-config: $HOME/.flyte/sandbox/kubeconfig

webhook:
  certDir: $HOME/.flyte/webhook-certs
  localCert: true
  secretName: flyte-sandbox-webhook-secret
  serviceName: flyte-sandbox-local
  servicePort: 9443

tasks:
  task-plugins:
    enabled-plugins:
      - container
      - sidecar
      - K8S-ARRAY
      #- pytorch
      - tensorflow
      #- mpi
    default-for-task-types:
      - container: container
      - container_array: K8S-ARRAY
      - sidecar: sidecar
      #- pytorch: pytorch
      - tensorflow: tensorflow
      #- mpi: mpi
    fallback-to-container-handler: false

plugins:
  logs:
    kubernetes-enabled: true
    kubernetes-template-uri: http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace }}/{{ .podName }}/pod?namespace={{ .namespace }}
    cloudwatch-enabled: false
    stackdriver-enabled: false
  k8s:
    image-pull-policy: Always
    default-env-vars:
      - FLYTE_AWS_ENDPOINT: http://flyte-sandbox-minio.flyte:9000
      - FLYTE_AWS_ACCESS_KEY_ID: minio
      - FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
  k8s-array:
    logs:
      config:
        kubernetes-enabled: true
        kubernetes-template-uri: http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace }}/{{ .podName }}/pod?namespace={{ .namespace }}
        cloudwatch-enabled: false
        stackdriver-enabled: false

database:
  postgres:
    username: postgres
    password: postgres
    host: 127.0.0.1
    port: 30001
    dbname: flyte
    options: "sslmode=disable"
storage:
  type: stow
  stow:
    kind: s3
    config:
      region: us-east-1
      disable_ssl: true
      v2_signing: true
      endpoint: http://localhost:30002
      auth_type: accesskey
      access_key_id: minio
      secret_key: miniostorage
  container: my-s3-bucket

task_resources:
  defaults:
    cpu: 2
    memory: 1Gi
  limits:
    cpu: 4
    memory: 4Gi

In the parent folder of flyte and flytekit repo

  1. Create Dockerfile
FROM python:3.11-slim-bookworm as builder

WORKDIR /root
ENV PYTHONPATH /root

# Install build dependencies
RUN apt update \
    && apt install build-essential git wget -y \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Copy necessary directories
COPY flyte /flyte
COPY flytekit /flytekit

# Install Python packages (Order is important!)
RUN pip install --no-cache-dir /flytekit/plugins/flytekit-kf-tensorflow \
    && pip install --no-cache-dir /flytekit \
    && pip install --no-cache-dir /flyte/flyteidl
  1. Run docker buildx build -t localhost:30000/flytekit:dev --file Dockerfile --push .

In an arbitrary folder

  1. Create kubeflow_tf_evaluator.py
from flytekitplugins.kftensorflow import PS, Chief, Evaluator, TfJob, Worker

from flytekit import Resources, task

task_config = TfJob(
    worker=Worker(replicas=2),
    chief=Chief(replicas=1),
    ps=PS(replicas=1),
    evaluator=Evaluator(replicas=1),
)


@task(
    task_config=task_config,
    requests=Resources(cpu="1"),
)
def my_tensorflow_task(x: int, y: str) -> str:
    return f"{x=}, {y=}"
  1. Run pyflyte run --remote --image localhost:30000/flytekit:dev kubeflow_tf_evaluator.py my_tensorflow_task --x 100 --y acc

Test backward compatibility

  1. Checkout to master branch in flytekit repo.
  2. Rebuild docker image with docker buildx build -t localhost:30000/flytekit:dev --file Dockerfile --push . (In the parent folder of flyte and flytekit folder)
  3. Run pyflyte run --remote --image localhost:30000/flytekit:dev kubeflow_tf_evaluator.py my_tensorflow_task --x 100 --y acc

Screenshots

Note that the worker replica is 2.

image

image

Check all the applicable boxes

  • [x] I updated the documentation accordingly.
  • [x] All new and existing tests passed.
  • [x] All commits are signed-off.

Related PRs

https://github.com/flyteorg/flyte/pull/5355

Docs link

MortalHappiness avatar May 16 '24 15:05 MortalHappiness

This is a PR for you to test evaluator. https://github.com/flyteorg/flytekit/pull/1870

Future-Outlier avatar May 17 '24 04:05 Future-Outlier

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 58.31%. Comparing base (69445ff) to head (cd6232a). Report is 4 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2424       +/-   ##
===========================================
- Coverage   79.24%   58.31%   -20.94%     
===========================================
  Files         196      250       +54     
  Lines       19785    22092     +2307     
  Branches     4008     4006        -2     
===========================================
- Hits        15678    12882     -2796     
- Misses       3407     8700     +5293     
+ Partials      700      510      -190     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 26 '24 15:05 codecov[bot]

@MortalHappiness , can you merge master to get rid of the dbt failures?

In order to fix the CI failures for kf-mpi and kf-tensorflow you'll need to add a line similar to this to https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-kf-pytorch/dev-requirements.in and create the corresponding dev-requirements.in in https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-kf-tensorflow.

eapolinario avatar Jun 12 '24 01:06 eapolinario