flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

feat: Add timeout to container task

Open Narwhal-fish opened this issue 4 months ago • 5 comments

Tracking issue

Related to flyteorg/flyte#6351

Why are the changes needed?

Currently, ContainerTask lacks built-in timeout functionality, which is available for regular Python tasks via the @task decorator.

What changes were proposed in this pull request?

This PR adds a timeout parameter to ContainerTask that accepts datetime.timedelta objects:

How was this patch tested?

  1. test_container_task_timeout(): Tests local execution timeout with timedelta
  2. test_container_task_timeout_k8s_serialization(): Verifies K8s activeDeadlineSeconds serialization
  3. test_container_task_no_timeout(): Ensures normal execution works with sufficient timeout

Setup process

Screenshots

Check all the applicable boxes

  • [ ] I updated the documentation accordingly.
  • [ ] All new and existing tests passed.
  • [x] All commits are signed-off.

Related PRs

Docs link

Summary by Bito

This pull request enhances the ContainerTask functionality by adding a timeout parameter for maximum execution duration, enabling it to accept a timeout parameter via a datetime.timedelta object. Modifications include updates to the ContainerTask class and Kubernetes pod specification, along with new unit tests to verify the feature's functionality during local execution and Kubernetes serialization, enhancing the task's capabilities to match those of standard Python tasks.

Narwhal-fish avatar Aug 14 '25 16:08 Narwhal-fish

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

welcome[bot] avatar Aug 14 '25 16:08 welcome[bot]

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 83.58%. Comparing base (eb5a67f) to head (dccb4ee). :warning: Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3314      +/-   ##
==========================================
- Coverage   85.26%   83.58%   -1.68%     
==========================================
  Files         386        3     -383     
  Lines       30276      195   -30081     
  Branches     2969        0    -2969     
==========================================
- Hits        25814      163   -25651     
+ Misses       3615       32    -3583     
+ Partials      847        0     -847     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Aug 18 '25 06:08 codecov[bot]

I'm not sure if this is the correct approach. For regular Python tasks the timeout is passed through the task metadata and queueing time is counted towards the overall task timeout. It looks like this implementation only sets the active deadline seconds on the pod which will only affect the underlying container when it eventually runs.

So I am a bit worried about the difference in semantics. Is it possible to instead or also plumb the timeout through the task metadata?

Sovietaced avatar Aug 31 '25 06:08 Sovietaced

I'm not sure if this is the correct approach. For regular Python tasks the timeout is passed through the task metadata and queueing time is counted towards the overall task timeout. It looks like this implementation only sets the active deadline seconds on the pod which will only affect the underlying container when it eventually runs.

So I am a bit worried about the difference in semantics. Is it possible to instead or also plumb the timeout through the task metadata?

Sorry for the late reply. I hadn't considered the importance of using the standard task metadata timeout mechanism to ensure consistent behavior with regular Python tasks.

I'll modify the implementation to plumb the timeout through the task metadata so that it includes queueing time and follows the same execution semantics as regular Python tasks.

Narwhal-fish avatar Sep 01 '25 15:09 Narwhal-fish

Bito Automatic Review Failed - Technical Failure

Bito encountered technical difficulties while generating code feedback . To retry, type /review in a comment and save. If the issue persists, contact [email protected] and provide the following details:

Agent Run ID: 7ebdf8bb-bf4c-4f1b-83ef-8752ac9ef761

flyte-bot avatar Sep 01 '25 17:09 flyte-bot