cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

Cycling over a list of strings

Open hjoliver opened this issue 11 months ago • 2 comments

... or some generalization of that.

This would make it more obvious (than integer cycling) that Cylc can be used to process (e.g.) a list of many datasets.

Potentially the "next cycle" could even be determined externally at run time, so you don't have to define the list up front.

Probably easy to implement.

https://cylc.discourse.group/t/how-to-process-a-large-number-of-datasets-fast-with-cylc/912/3

hjoliver avatar Mar 05 '24 22:03 hjoliver

To achieve this we would require a further abstraction of the cycling interface to remove the requirement for cycle point arithmetic and reduce it down to a simple non-rewindable generator.

The cycling / task_pool code would have to manage the generators, count cycles for runahead limiting and handle start/stop logic.

To be useful, this would need to be combinable (i.e. multi-dimensional cycling) or at least interoperable (e.g. a more powerful solution to https://github.com/cylc/cylc-flow/issues/4912) with conventional cycling. Otherwise it would only be catering to a small number of integer cycling use cases.

Perfectly possible but a substantial task, this was considered as cycle drivers.

oliver-sanders avatar Mar 06 '24 10:03 oliver-sanders

To be useful, this would need to be combinable (i.e. multi-dimensional cycling) or at least interoperable (e.g. a more powerful solution to https://github.com/cylc/cylc-flow/issues/4912) with conventional cycling. Otherwise it would only be catering to a small number of integer cycling use cases.

That would be more powerful, of course, but it will probably be a long time coming. Short of that, the basic cycling that I'm suggesting would be (a) much easier to implement, and (b) useful as a more intuitive alternative to integer cycling when you need to cycle over a list of named datasets (or whatever) - which I don't think is a only small number of integer cycling use cases.

As the linked forum post shows, integer cycling can seem "awkward" to users for these use cases. Cycle point is a task runtime concept, so you have to know to convert it to dataset ID (or whatever) in job scripting. Cycling over a list of strings could be fully determined during config file parsing.

(And we did have low-key plans to do this, rather a long time back).

hjoliver avatar Mar 06 '24 20:03 hjoliver