terraform-provider-iterative icon indicating copy to clipboard operation
terraform-provider-iterative copied to clipboard

Added indexed option for k8s jobs

Open sjawhar opened this issue 2 years ago • 27 comments

Purpose

Parallel K8s jobs have two modes: Indexed and NonIndexed (default). Indexed jobs have some nice abilities, for example each job gets a JOB_COMPLETION_INDEX environment variable so it could assign itself work without needing a separate task queue. There's currently no way to specify the k8s completion mode.

Approach

Added an optional boolean indexed attribute

Example when set to false or ommitted: image

Example when set to true image

TODO

  • [ ] Add tests
  • [ ] Update docs

sjawhar avatar Jun 01 '22 23:06 sjawhar

Can you provide a main.tf you tested with/can be tested with/example?

Also thanks for the contribution!

dacbd avatar Jun 01 '22 23:06 dacbd

Can you provide a main.tf you tested with/can be tested with/example?

First, create a persistent volume with ReadWriteMany. For example:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-storage
  labels:
    type: local
spec:
  capacity:
    storage: 4Gi
  accessModes:
    - ReadWriteMany
  hostPath:
    path: /tpi-data

Then use this main.tf

terraform {
  required_providers {
    iterative = {
      source = "github.com/iterative/iterative"
    }
  }
}
provider "iterative" {}

resource "iterative_task" "example" {
  cloud     = "k8s"
  machine   = "s"
  disk_size = 1

  parallelism = 3
  indexed     = false

  storage {
    workdir = "."
    output  = "results"
  }
  script = <<-END
    #!/bin/bash

    mkdir -p results
    printenv > results/out-$JOB_COMPLETION_INDEX
  END
}

sjawhar avatar Jun 01 '22 23:06 sjawhar

related #585 /CC @0x2b3bfa0

casperdcl avatar Jun 02 '22 10:06 casperdcl

Awesome feature, @sjawhar! heart_eyes It would be nice to enable IndexedCompletion automatically when parallelism is greater than 1 instead of exposing a separate option that has no equivalent on other providers.

Thanks for the quick review! I also didn't like having an option that only applies to k8s, but I don't quite agree with your suggestion. Let me try to explain, though bear with me because I'm coming from a very cloud-heavy background and am relatively new to k8s.

K8s has two ways of running parallel jobs: indexed and nonindexed. Nonindexed means the jobs are identical, start up N, they do their thing. Indexed means that k8s expects a success for at least one job of each index 1-N. In other words, if job 3 fails, it'll keep retrying 3 specifically. They're useful for different things: you might use nonindexed to spawn N workers to accomplish M jobs by reading from a task queue, whereas you might use indexed to accomplish the same thing without the queue (e.g. by giving the full task list to each job, and then it assigns itself work based on its index).

tl;dr: both indexed and nonindexed are useful ways of running parallel jobs in k8s, and I don't think the user should be forced to use one or the other.

Now here's an extra twist: to accomplish the indexed example I described above (N workers for M jobs, where M > N), we need to be able to set .spec.completions and .spec.parallelism separately. So we actually need to add another k8s-specific option :sweat_smile:.

Thoughts on all this?

UPDATE: What if we removed indexed and added completions, and set CompletionMode to indexed if completions is set. That way there's only one k8s-specific option :smiley_cat:

sjawhar avatar Jun 03 '22 18:06 sjawhar

if we could make index less k8s specific like somehow populate an env like TASK_COMPLETION_INDEX = [1-3] basically reproducing the k8s JOB_COMPLETION_INDEX behavior for the cloud instances 🤔

dacbd avatar Jun 03 '22 20:06 dacbd

I added the completions functionality as described in my last comment. This just reflects what works for my use case, I'm happy to tear apart and refactor to make more generic, as you have both described. Also, please forgive my Go, this might be precisely the second time I've ever used it :sweat_smile:

sjawhar avatar Jun 04 '22 17:06 sjawhar

For the use cases we[^1] have in mind, the following should be enough:

  • CompletionMode equals to NonIndexed if parallelism is 1 else Indexed
  • Completions equals to parallelism
  • Parallelism equals to parallelism

The following scenarios are still not supported:

  1. Run $m$ different tasks on $n$ nodes, where $m \neq n$
  2. Run $n$ identical workers taking tasks from an external queue

Out of curiosity, can you share a bit more of information on your use case? I guess that you may want to restrict the parallelism so running a massively parallel task doesn't exhaust your cluster's resources. 🤔

[^1]: 🔔 @iterative/cml for sanity

0x2b3bfa0 avatar Jun 04 '22 18:06 0x2b3bfa0

CompletionMode equals to NonIndexed if parallelism is 1 else Indexed

I think you still might not be grokking the purpose of non/indexed. They're both valid modes of parallel operation, it's not about parallelism = 1 or not.

Out of curiosity, can you share a bit more of information on your use case? I guess that you may want to restrict the parallelism so running a massively parallel task doesn't exhaust your cluster's resources. 🤔

You got it. If I need to run 100 things, I don't want to spawn 100 pods. This seems to me like a necessary consideration for supporting on-prem/k8s.

As you can see from the diff, this is a very minor change that opens up a lot of functionality in k8s. I'd be grateful if you'd reconsider the pros/cons here. 🙏

sjawhar avatar Jun 04 '22 19:06 sjawhar

If I need to run 100 things, I don't want to spawn 100 pods

What you propose is similar to jobs.<job_id>.strategy.max-parallel on GitHub Actions and could be useful, although supporting it outside k8s is significantly more complex.

As per your suggestion, the current parallelism argument should be replaced by two different arguments:

  1. Number of times to run a task (i.e. completions in the Kubernetes parlance)
  2. Maximum number of concurrent workers (i.e. parallelism in the Kubernetes parlance)

By default, 2 should be equal to 1 if not specified by users.

0x2b3bfa0 avatar Jun 05 '22 16:06 0x2b3bfa0

I think you still might not be grokking the purpose of non/indexed. They're both valid modes of parallel operation, it's not about parallelism = 1 or not.

Sorry, I explained myself badly. 🙈

Traditionally, Kubernetes Jobs didn't have an Indexed mode, and were meant to run exactly the same code one or more times. Using an external task queue was the officially recommended way of running different tasks.

This project tries to simplify the user's workflow as much as possible: the Indexed mode eliminates the need for an external task queue in those cases when the total number of tasks is known in advance, but still allows users to differentiate every index.

My insistence of using NonIndexed mode when parallelism is 1 is just to support Kubernetes versions prior 1.24 if users don't need this feature. Requiring a single completion renders this feature useless, so we can safely disable it; when requiring more completions, it will be always enabled.

0x2b3bfa0 avatar Jun 05 '22 20:06 0x2b3bfa0

@iterative/cml So, assuming: CompletionsMode can be inferred implicitly, and going with the latest suggestion.

  1. How do we want to handle something like (user provides k8s specific field when not using k8s):
resource "iterative_task" "example" {
  cloud     = "aws"
  ...
  parallelism = 3
  completions = 5
 }

Error: "completions not supported for non-k8s run-time"? (I am allergic to referring to k8s as a cloud provider 😓 ) We definitely don't have to support a task-queue like logic at this point IMO. We're stepping dangerously into "implementing scheduler" territory.

  1. Considering the above, I have to point out the very likely fact that more and more k8s-specific switches will probably be requested and added in the future, and maybe some cloud-provider specific ones as well - some that it won't be natural to abstract away or support across providers neatly. My personal opinion is that we need to keep those as true as possible to the original behavior (=same name, some provider/run-time specific block? maybe nested under "advanced"?) so users can easily infer their behavior from the original docs without inventing ad-hoc abstractions for them that aren't natural. Thoughts on this?

omesser avatar Jun 09 '22 13:06 omesser

introducing a k8s-specific config block makes sense, esp. since it's not-a-cloud:

resource "iterative_task" "example" {
  cloud       = "k8s"
  parallelism = ... # warn iff defined on `cloud = "k8s"`?

  # k8s-specific block
  k8s {
    parallelism = 3 # in the k8s sense, not in the TPI sense
    completions = 5
  }
}

casperdcl avatar Jun 10 '22 17:06 casperdcl

I'd avoid introducing backend-specific details, especially when the outer parallelism is functionally equivalent to the inner completions and the inner parallelism has no equivalent in other backends.

0x2b3bfa0 avatar Jun 13 '22 12:06 0x2b3bfa0

@sjawhar, would it be possible to keep the original scope of this pull request (enable indexed mode when Kubernetes completions > 1) and open a separate pull request for the addition of a separate parallelism limit?

0x2b3bfa0 avatar Jun 13 '22 12:06 0x2b3bfa0

@sjawhar, would it be possible to keep the original scope of this pull request (enable indexed mode when Kubernetes completions > 1) and open a separate pull request for the addition of a separate parallelism limit?

I really don't think that's a good idea. Users should not be forced to use indexed mode. I went that route initially just to make it work for my use case as quickly as possible, but this would be both a change from the current behavior and unnecessarily limiting. Are you sure you want to do that?

sjawhar avatar Jun 13 '22 15:06 sjawhar

this would be both a change from the current behavior and unnecessarily limiting

I thought that enabling the indexed mode doesn't have any significant effect beyond exposing a couple of additional environment variables. Which kind of unnecessary limitations do you foresee?

0x2b3bfa0 avatar Jun 13 '22 22:06 0x2b3bfa0

I thought that enabling the indexed mode doesn't have any significant effect beyond exposing a couple of additional environment variables. Which kind of unnecessary limitations do you foresee?

It's not just about environment variables. From the docs:

  • NonIndexed (default): the Job is considered complete when there have been .spec.completions successfully completed Pods. In other words, each Pod completion is homologous to each other.
  • Indexed: the Pods of a Job get an associated completion index from 0 to .spec.completions-1. The Job is considered complete when there is one successfully completed Pod for each index.

These are two very different behaviors. Your suggestion work perfectly well for my use case (Indexed is what I need), but it seems unnecessarily limiting. The spec already has provider-specific flags (e.g. region does not apply to k8s), so we're not breaking new ground by adding another one.

Nonetheless, if you're sure then I'll make the change.

sjawhar avatar Jun 14 '22 01:06 sjawhar

Your suggestion work perfectly well for my use case (Indexed is what I need), but it seems unnecessarily limiting.

Indexed mode is precisely the kind of behavior this tool is meant to have; see #585. Unless there is any practical use case that requires the NonIndexed mode, it looks like there is no pressing need to support both.

The spec already has provider-specific flags (e.g. region does not apply to k8s), so we're not breaking new ground by adding another one.

Ouch! 😅 Indeed, although there were plans to “rename region to location so it's valid for zones and node selectors” as per https://github.com/iterative/terraform-provider-iterative/issues/412#issue-1152159459

0x2b3bfa0 avatar Jun 15 '22 19:06 0x2b3bfa0

@0x2b3bfa0

Indexed mode is precisely the kind of behavior this tool is meant to have; see https://github.com/iterative/terraform-provider-iterative/issues/585. Unless there is any practical use case that requires the NonIndexed mode, it looks like there is no pressing need to support both.

I'm with @sjawhar on this one - it makes no sense taking away this choice for the perceived "simplicity" of merely saving a field here (1:1 with k8s field, explainability & docs built in). If we will get here the power of NonIndexed for the case of k8s only - that's still valuable IMO. No need to abuse terminology or over generalize - this should be in a k8s-specific block imo (yes, exposing backend specific details), no need to try and provide same functionality/choice across all other clouds/engines - that would be a broken abstraction. The key to keep our code powerful and still simple, meaningful and maintainable, IMO, is this - once something is not easily/directly generalizable - keep it provider specific, help the user transition to k8s terminology

"rename region to location so it's valid for zones and node selectors"

I'd advise against 😨 (commented on #412 )

omesser avatar Jun 19 '22 13:06 omesser

the power of NonIndexed

Citation needed. 😅 If we provide “powerful” options it should be for a reason, methinks.

0x2b3bfa0 avatar Jun 19 '22 13:06 0x2b3bfa0

this should be in a k8s-specific block imo (yes, exposing backend specific details), no need to try and provide same functionality/choice across all other clouds/engines - that would be a broken abstraction.

We may end up exposing backend-specific details when shoehorning them under a common specification is proven to be difficult or impractical. 👍🏼

Although I wonder if this is the case.

0x2b3bfa0 avatar Jun 19 '22 13:06 0x2b3bfa0

The key to keep our code powerful and still simple, meaningful and maintainable, IMO, is this - once something is not easily/directly generalizable - keep it provider specific, help the user transition to k8s terminology

When terminology is [susceptible of being] generalizable and different backends overload the same names with different meanings, I believe that it makes sense to run the extra mile and “generalize” on our side a bit more.

I'd argue that this particular functionality should be generalized, although in some other cases (e.g. placement) we end up resorting to specific options.

0x2b3bfa0 avatar Jun 19 '22 14:06 0x2b3bfa0

taking away this choice for the perceived "simplicity" of merely saving a field here

Taking away some choices isn't implicitly negative; quite the opposite. Unless we find any supported use case that benefits particularly from NonIndexed mode, I'd consider not exposing that choice to users.

0x2b3bfa0 avatar Jun 19 '22 14:06 0x2b3bfa0

We discussed this offline a bit, and specifically - I find it hard to defend the choice for NoneIndexed, except for the religious reason of "give me ma k8s options!" So I would say we can live without it for now

omesser avatar Jun 23 '22 15:06 omesser

Ok, I'll make the changes sometime in the next few days

sjawhar avatar Jun 23 '22 19:06 sjawhar

I came across this PR while looking for this feature with AWS EC2. I think the ability to operate parallel instances with regular cloud providers and have some sort of indexing, or any mechanism, to dispatch work to the different instances can greatly help small teams and individual developers who don't have resources to manage k8s.

redabuspatrol avatar Jul 13 '22 18:07 redabuspatrol

@redabuspatrol, can you please comment on #585? In the meanwhile, you may also want to try having a separate task for every chunk of work.

0x2b3bfa0 avatar Jul 14 '22 09:07 0x2b3bfa0

Sorry for the long delay. I've reverted most of the changes, and implemented @0x2b3bfa0 's preferred solution.

sjawhar avatar Aug 16 '22 20:08 sjawhar

Thank you very much, @sjawhar! 🙏🏼

0x2b3bfa0 avatar Aug 17 '22 23:08 0x2b3bfa0

Thanks so much @sjawhar! Sorry for the delay and (perhaps excessive) hesitancy from our end too.

The reverted changes still look worth pursuing to us, just in separate PRs. We'd probably also be much quicker at understanding & merging said PRs if you could share an example of their use too... I assume related to things like https://github.com/iterative/terraform-provider-iterative/compare/master...sjawhar:terraform-provider-iterative:feature/nfs-volume and https://github.com/iterative/dvc/compare/main...sjawhar:dvc:feature/parallel-repro?

casperdcl avatar Aug 19 '22 17:08 casperdcl