bazel
bazel copied to clipboard
Create a frozen dictionary type.
This type is hashable, and thus can be used in a depset.
I'm trying to solve the following use case, which is very common when creating tree-like structures.
child = FooInfo(
something = {"foo": "bar"},
children = depset([])
)
parent = FooInfo(
something = {},
children = depset([child])
)
Note: I haven't added any test cases yet, both because I first want approval that it's the right thing to do before going to the effort of making test cases, and because I don't know where the right place is to put my test cases.
For your use case, can you instead use struct?
child = FooInfo(
something = struct(foo = "bar"),
children = depset([])
)
...
Struct does indeed work for this specific use case. However:
- Struct does not work when the keys are not strings
- it makes it really awkward to use (especially until
--incompatible_struct_has_no_methodsis removed):- Iteration through all key-value pairs requires
structs.to_dict - I'm unsure if struct iteration order is preserved - I would suspect not. Dict has a well-defined iteration order (insertion order)
- Iteration through all keys is very awkward (
[k for k in dir(struct) if k != "to_json" and k != "to_proto"]) - It's very confusing for a user of such a type if they need to use
getattr(d, key)instead of justd[key]. You have to explain to them the mental hoops they have to jump through, because it's very unintuitive for a user. This is generally ok if the type is hidden away in an abstraction that only you need to know about, but makes it very hard when you need to make a public API.
- Iteration through all key-value pairs requires
Interesting point. Then I guess the design dimensions here are:
- do we need the
frozenset()Starlark builtin method? (You provide a very good argument for yes) - is "frozenset" a good name for it? (I would say yes, given prior art: https://pypi.org/project/frozendict)
- should this be a Bazel-only thing, or do we propose it as a language spec change? (I think if we agree to do this, it should be in the language spec)
- should the value returned by
frozenset()be deeply or shallowly frozen? (The name suggests deeply frozen, which may be confusing; shallowly frozen is much cheaper to construct, but much more expensive to check for hashability.) - should it be behind an incompatible flag?
- should the type name or stringification of the object returned by
frozenset()be distinguishable fromdictin Starlark code? - in java, should it be implemented as a new subclass of Dict or Map, or a special way of constructing Dict?
I'm a little confused by your use of the word frozenset here, while I used the word frozendict. I'm going to just assume that every time you wrote frozenset, you're referring to my frozendict type, since bazel doesn't have a set type.
Interesting point. Then I guess the design dimensions here are:
- do we need the
frozenset()Starlark builtin method? (You provide a very good argument for yes)- is "frozenset" a good name for it? (I would say yes, given prior art: https://pypi.org/project/frozendict)
I think frozendict is the right name for it. If you did intentionally write frozenset, then I think that just adds confusion, because a set is its own seperate type.
- should this be a Bazel-only thing, or do we propose it as a language spec change? (I think if we agree to do this, it should be in the language spec)
I certainly wouldn't be opposed to a language spec change. I filed https://github.com/bazelbuild/starlark/issues/272 a while back.
- should the value returned by
frozenset()be deeply or shallowly frozen? (The name suggests deeply frozen, which may be confusing; shallowly frozen is much cheaper to construct, but much more expensive to check for hashability.)
I'd personally be a fan of deeply frozen (moving the checkHashable check to the constructor), I just thought that might be contentious.
- should it be behind an incompatible flag?
I don't think so. The only difference here is that we add a new builtin method, and that shouldn't be incompatible with anything. If I write frozendict = {"a": "b"}, that just works fine.
- should the type name or stringification of the object returned by
frozenset()be distinguishable fromdictin Starlark code?
I'd personally go with no. I think that for the most part, a user doesn't really need to distinguish between a dict and a frozendict. This is, practically speaking, equivalent to what we currently have for a dict constructed by a different rule.
- in java, should it be implemented as a new subclass of Dict or Map, or a special way of constructing Dict?
I originally added a field to the regular Dict type initially_frozen of type bool. I don't have a strong preference, but I figured that this would add extra memory consumption to every dict, while this current method would only change the vtables.
I'm a little confused by your use of the word frozenset here, while I used the word frozendict. I'm going to just assume that every time you wrote frozenset, you're referring to my frozendict type, since bazel doesn't have a set type.
Yes, typo, sorry! I was considering this feature in the context of a native set type which users have also been asking for for a while, and whether we'd need a frozenset in addition to a mutable set - and mistyped.
Hey @matts1,
do you care to explain more what you're doing, to avoid AB problem?
I've seen tree-like structures in the rules before and they were often an anti-pattern. That's because the targets already form a dag and often it's possible to use that without copying that structure in the providers. Once you copy the structure into providers, it's really painful to process since Starlark doesn't have recursion and it's hard to visit them. aspects can for example do much better job visiting dags. But there are some cases that are not well covered and complex providers seem necessary atm.
I ran into this problem for rules_directory while trying to generate a tree for a directory within a single rule.
I had discussions in the starlark team chat last week. The tldr is that it seemed like this isn't especially high priority, and there would be better ways to do it (by resolving the todo to use checkhashable instead of isimmutable and fixing an existing bug with cyclic hashing crashing bazel)