kopf
kopf copied to clipboard
An alternative way to use indexes without propagating them through the call stack
Problem
I'm using in-memory indexes and overall I think they work really well. One thing that's been nagging me though is that you can only get to them through the kwargs injected in handlers. My pain point with this approach is that while practical, it gets ugly when working with nested indices.
Take this slightly changed example from the docs:
@kopf.index("pods")
def primary(namespace, name, spec, **_):
container_names = {container["name"] for container in spec["containers"]}
return {(namespace, name): container_names}
@kopf.index("pods")
def secondary(namespace, name, **_):
return {namespace: name}
def get_value(
primary: kopf.Index,
secondary: kopf.Index,
namespace: str
):
...
@kopf.timer(...)
async def handler(
namespace,
primary: kopf.Index,
secondary: kopf.Index,
# other args ..
):
value = get_value(
primary,
secondary,
# some other lookup arguments
)
...
In practice you tend to have descriptive names so, for example:
-
primary
might turn intocontainers_by_namespace_and_pod_name
. -
secondary
might turn intopod_name_by_namespace
. -
get_value
might turn intoget_monitored_containers
.
.. which makes everything repetitive, verbose and arguably harder to read.
Proposal
Provide an alternative way of accessing indexes while running in the context of a handler without having to propagate all needed indexes through the call stack. One thing that comes to mind could be to access the index similarly to how you would access a contextvar
that is set in the context of the handler. With this in place the above could be rewritten as:
@kopf.index("pods")
def containers_by_namespace_and_pod_name(namespace, name, spec, **_):
container_names = {container["name"] for container in spec["containers"]}
return {(namespace, name): container_names}
@kopf.index("pods")
def pods_by_namespace(namespace, name, **_):
return {namespace: name}
def get_monitored_containers(
namespace: str
):
primary = kopf.indexes.get("containers_by_namespace_and_pod_name")
secondary = kopf.indexes.get("pods_by_namespace")
# or maybe:
# primary = containers_by_namespace_and_pod_name.get_index()
# secondary = pods_by_namespace.get_index()
# use primary and secondary
@kopf.timer(...)
async def handler(
namespace,
# other args ..
):
value = get_monitored_containers(namespace)
...
With this approach:
- The verbosity is hidden away in the function that makes use of the index (
get_monitored_containers
in this case). - Repetition is decreased because you don't have to pass the indexes through the call stack.
- The handler is easier to read because of decreased verbosity and repetition.
What are your thoughts on this?
Checklist
- [x] Many users can benefit from this feature, it is not a one-time case
- [x] The proposal is related to the K8s operator framework, not to the K8s client libraries
Thanks for suggesting this! Indeed, I was not so happy with the injected kwargs too, but that seemed like a lesser evil at the moment.
First of all, I am not sure which length is problematic here: the name of the function or the name of the index itself. If it is the latter, that can be overridden with id="proper_name"
on the decorator; the function name is only a convenient default for the id.
When this was developed, another option was to inject a single kwarg named indices
(or indexes
?), which is a dict of all available indices. The name is debatable though: indices-vs-indexes. Naming (and aligning with Go's "cache" terminology) was one of the hardest parts of this task.
In that case, you would only need to pass one arg/kwarg with all indices at once. However, accessing indices would be wordier: indices['containers_by_namespace_and_pod_name']
instead of shorter containers_by_namespace_and_pod_name
. But if both approaches are used, that should be fine.
This option would be easy to implement (it is already implemented internally, but not exposed).
Yet another option in consideration was the multi-level indices:
@kopf.index("pods")
def containers_by_namespace_and_pod_name(namespace, name, **_):
container_names = {container["name"] for container in spec["containers"]}
return {namespace: {name: container_names}}
But it seemed to be an unrealistic overcomplication, so I dropped the whole idea and made the indices flat. That might be reconsidered, but would require some extra effort to distinguish it from the usual indexed dicts (e.g. by nested=True
on the decorator or extra words in results: return {namespace: kopf.Subindex({name: container_names})}
).
However, I would prefer to avoid hidden global variables and stay with the more explicit way of accepting kwargs directly — for the sake of explicitness. Besides, the global-var approach would bring all other data also available that way, not only indices.
(On a side note, the same logic is true for the absence of handlers' inspections in Kopf: I prefer to pass all kwargs explicitly regardless of what is actually used, letting the developers see them all and decide what to use (and ignore the rest via **_
) — rather than hiding what is not explicitly accepted and leaving the developers clueless of what is available.)
Using the functions as index accessors (pods_by_namespace.get_index()
) was also considered. That, however, was rejected in the drafting stage: it creates quite weird semantics (functions as data-storing objects), is troublesome or impossible for strict typing, and is incompatible with some use-cases (e.g., multiple indices with the same indexing function but different resources and/or filters).
Yet another option I thought of recently, is redirecting the functions into user-controlled indexes:
import kopf
containers_by_namespace_and_pod_name = kopf.Index()
pods_by_namespace = kopf.Index()
@kopf.index('pods', to=containers_by_namespace_and_pod_name)
def index_containers(**_):
return ...
@kopf.index('pods', to=pods_by_namespace)
def index_names(**_):
return ...
def get_monitored_containers(namespace: str):
for name in pods_by_namespace[namespace]:
containers = containers_by_namespace_and_pod_name[(namespace, name)]
...
In that case, all lengthy variable names are available directly, are explicit, and do not break anything in the original usage scenario.
What do you think about these options?
Thanks for the quick follow-up!
Regarding the length of a name, I would say it doesn't matter if it's on the function or on the index. In this situation I tend to favor descriptive names over short names and in some cases, coming up with a short descriptive name is can be challenging. Regardless, I would say the length itself is not a problem as long as it's contained somehow.
Speaking of which, the first suggestion in which you pass all indexes into a handler through a single argument is a good option that, subjectively, fits well with how kopf does things right now. On the other hand, I would see myself preferring your last suggestion where you 'redirect' data into user controlled indexes in some situations.
The advantage I see with the later approach is that you can, for the lack of a better word, neatly "hide" some "lower level complexity" behind a well named high level function and thus keep working at that "higher level of detail". If used well, I think it can lead to very readable and easy to understand code.
On multi-level indexes, right now I share the same opinion as you. I admit I haven't given it a lot of thought, but it does feel complicated to handle it properly.
So to conclude, I think that option #1 is okay and feels consistent with how things are done right now whereas option #2 would be something nice to have.
I second the proposal for something along the lines of kopf.indexes.get()
. We have several nested layers of functionality, and would be nice to be able to access the indexes directly at a lower level, without having to pass them down through multiple def layers.