go icon indicating copy to clipboard operation
go copied to clipboard

proposal: sync: new type `MapHash`

Open mknyszek opened this issue 1 month ago • 12 comments

proposal: sync: new type MapHash

Background

The sync package's Map type treats keys and values as type any, and requires that the keys are comparable. This both leaves performance on the table by frequently forcing simple value types to be boxed into an any and eschews type-safety, leading to lots of code that simply states the types of the keys and values in a comment, such as this (one of many) example from the standard library:

var listCache sync.Map // map[string]listImports, keyed by contextName

Today we have generics, so we can make uses of sync.Map (which are the common ones) much more type-safe. The current situation is especially silly since the underlying implementation of a Map has actually been type-parameterized since Go 1.23!

One reason we haven't been able to move forward with a type-parameterized sync.Map in the sync package itself is that we had no idea what to call it. Creating a new sync/v2 package was proposed as an alternative, but because only two types (Pool and Map) would actually benefit from generics at this time, we would be forced to incur all the costs of a v2 package for just those types.

Also, @bradfitz has since exported the internal implementation of sync.Map, internal/sync.HashTrieMap.

Proposal

With the standardization of the hash function, there's an opportunity to modernize the Map type and give it a new, clear name: MapHash. The existing Map can be defined in terms of it.

I propose the following API changes for the sync package.

type MapHash[K, V any, H maphash.Hasher[K]] struct{ ... }

func (*MapHash[K, V, H]) All() iter.Seq2[K, V]
func (*MapHash[K, V, H]) Clear()
func (*MapHash[K, V, H]) CompareAndDelete(key K, old V) (deleted bool)
func (*MapHash[K, V, H]) CompareAndSwap(key K, old, new V) (swapped bool)
func (*MapHash[K, V, H]) Delete(key K)
func (*MapHash[K, V, H]) Load(key K) (value V, ok bool)
func (*MapHash[K, V, H]) LoadAndDelete(key K) (value V, loaded bool)
func (*MapHash[K, V, H]) LoadOrStore(key K, value V) (result V, loaded bool)
func (*MapHash[K, V, H]) Range(yield func(key K, value V) bool) // for sync.Map
func (*MapHash[K, V, H]) Store(key K, old V)
func (*MapHash[K, V, H]) Swap(key K, new V) (previous V, loaded bool)

type Map = MapHash[any, any, maphash.ComparableHasher[any]]

This API is identical to sync.Map except that it makes use of the type parameters, and adds an idiomatic iterator with an All method that returns an iter.Seq2. This method has the same synchronization guarantees as Range.

NOTE: Do not pay too close attention to the type signature. It will change to follow whatever the broader consensus is on container type signatures. For now, this proposal simply matches #69559. The core part of the proposal is the addition of a new type with a custom maphash.Hasher in the existing sync package as a way to move forward on having at type-safe concurrent map. Changes to this part may affect some of the minor details in this proposal, such as whether we replace Map with a type alias.

Rationale: Why a custom maphash.Hasher?

The use-case for a custom maphash.Hasher is admittedly niche. Most of the time a comparable key type is good enough. The downside of requiring a maphash.Hasher is that it complicates the API.

There are two reasons I propose supporting a custom hash function:

  1. Consistency across the standard library.
  2. The underlying implementation trivially supports it.

For (1), this API is intended to exist alongside https://github.com/golang/go/issues/69559, which proposes a map implementation with a custom hash function.

For (2), my concern is that eliding support for custom hash functions would just encourage more unnecessary exports of the internal map implementation, precisely because it can trivially support it. This makes the choice of concurrent map even more messy and complicated in the broader ecosystem. A good quality implementation already exists in the standard library.

Alternative: Don't redefine Map

Alternatively, we don't define Map as a type alias, and instead just use the MapHash implementation under the hood. This way we don't need to add the Range method to MapHash.

One benefit of the type alias approach is that it may minimize the visibility of Map in the documentation when users go looking for a concurrent map.

Optional: Shorthand for comparable keys

A downside of MapHash is that common cases are very verbose, requiring the developer to write out maphash.ComparableHasher everywhere the type is referenced. We can make this common case much less verbose with a generic type alias, too:

type MapOf[K comparable, V any] = MapHash[K, V, maphash.ComparableHasher[K]]

The MapOf name is not intended to set a precedent or a new pattern that others should follow. It's a compromise that may or may not be a better alternative to creating a new sync/v2 package.

Implementation

We would need to rework the underlying map implementation of internal/sync.HashTrieMap to use the new maphash.Hasher, but this is fairly trivial to do, since the equality and hash functions are already invoked indirectly. Then we would be able to refactor internal/sync.HashTrieMap into the implementation of sync.MapHash in the sync package and remove the internal/sync package, which would be a nice internal cleanup.

Go 1 Compatibility

This proposal only adds a new API, so it's backward compatible. Although sync.Map is redefined, it still exposes an identical API to today, with the sole addition of the All method.

Alternatives

As mentioned earlier, we could choose to pursue a sync/v2 package wholesale instead, for which a proposal already exists.

I don't feel strongly about either, but at this point the sync/v2 proposal has fallen behind a little bit (namely around the addition of maphash.Hasher), though it could be trivially updated to accommodate it.

Acknowledgements

Thank you to Cherry, David, and Russ for your initial feedback on this proposal.

mknyszek avatar Dec 09 '25 01:12 mknyszek

Overall, I'm very excited to get access to a type-safe concurrent map in the stdlib. Thanks for sending this proposal.

I'd prefer that we do sync/v2 and put it in there. The sooner we can leave the any-based Pool and Map behind, the better. In v2, this type can just be called Map.

To bikeshed a bit, I don't understand why the name is MapHash. That sounds like it is a hash, but it is actually a map. Should it be called HashMap?

I don't see the benefit to redefining Map as a type alias of MapHash. A downside is the otherwise-unnecessary Range method which will essentially be deprecated as soon as it is written. Again, I'd prefer to have a nice, new, clean API and try to forget that the old Map type exists.

I think this proposal could use more motivation of the choice to expose the hasher. I'm somewhat ambivalent about it. It's a fairly niche bit of functionality which is probably quite nice to have very occasionally but isn't needed 99% of the time that sync.Map is used. I wonder if the simplicity tradeoff of only exposing a Map[K comparable, V any] would make more sense. (Another way of looking at it is: it would be surprising that Go gives you a concurrent hash table with a customizable hasher but doesn't give you a normal hash table with customizable hasher. Would people occasionally use sync.MapHash only because they want the customizable hasher?)

cespare avatar Dec 09 '25 08:12 cespare

To bikeshed a bit, I don't understand why the name is MapHash. That sounds like it is a hash, but it is actually a map. Should it be called HashMap?

An earlier version of this proposal was HashMap. I got feedback that the word for a hash map in Go is map, so this is really just signaling a map with a custom hash function.

I'd prefer that we do sync/v2 and put it in there. The sooner we can leave the any-based Pool and Map behind, the better. In v2, this type can just be called Map. ... Again, I'd prefer to have a nice, new, clean API and try to forget that the old Map type exists.

Fair enough. I don't think this is obviously better than a v2, but a v2 may have a steeper cost than adding a new API. It may be steeper than having an less desirable name. Frankly, I don't think I have the perspective to make that call.

My primary goal is to move this forward one way or another. The purpose of this proposal is two-fold:

  1. Try to make progress.
  2. Suggest that we push the envelope a little bit on the generality of the API (see my response below for why).

I don't see the benefit to redefining Map as a type alias of MapHash. A downside is the otherwise-unnecessary Range method which will essentially be deprecated as soon as it is written.

This is a downside, yes. We don't need to define it as a type alias. We can just implement one in terms of the other -- that's fine with me too. I'll update the proposal.

I think this proposal could use more motivation of the choice to expose the hasher. I'm somewhat ambivalent about it. It's a fairly niche bit of functionality which is probably quite nice to have very occasionally but isn't needed 99% of the time that sync.Map is used. I wonder if the simplicity tradeoff of only exposing a Map[K comparable, V any] would make more sense.

I agree that it is niche, and adds some complexity to the API. My argument for the full generality is consistency.

Other standard library discussions are moving in this direction and this is intended to try to be consistent with those (for example #69559, which follows from #70471) to the extent that it can. (Concurrent maps have a different set of operations, but type parameters can still be the same.)

There are also no implementation concerns with doing so, and not exposing a custom hash function (which is already how it works under the hood, roughly) would just encourage more "exports" of the underlying implementation, which feels unnecessarily messy and noisy for the ecosystem.

(Another way of looking at it is: it would be surprising that Go gives you a concurrent hash table with a customizable hasher but doesn't give you a normal hash table with customizable hasher. Would people occasionally use sync.MapHash only because they want the customizable hasher?)

I agree with this. I don't think sync.MapHash should be the only map implementation that accepts a custom hasher.

This proposal is not intended to replace #69559, which is what fills that niche.

mknyszek avatar Dec 09 '25 15:12 mknyszek

@cespare Thanks for your feedback, I've updated the proposal.

mknyszek avatar Dec 09 '25 15:12 mknyszek

An earlier version of this proposal was HashMap. I got feedback that the word for a hash map in Go is map, so this is really just signaling a map with a custom hash function.

My preferred solution for this would be to move to sync/v2 and keep the simple Map name. having the new type be available in the same package will create some confusion when to use one vs the other and will delay the switch to the better map. Also this does not solve the same problem for the Pool.

That said, if this is to be the chosen route, please don't deviate from the standard naming used also in other languages.

In most other context and programming languages the Implementation detail (Hash) comes before the What (Map) see in java/kotlin/rust (HashMap) c# (HashTable) etc. Also the Library exported by bradfritz (HashTrieMap). So precisely because the word for hash map in go is "occupied" let's go with the full name instead of reordering it around.

doggedOwl avatar Dec 09 '25 16:12 doggedOwl

Not to try to bikeshed too much here, but the key thing with the name is not that it has a hash, but that it as a custom hasher, so why not HasherMap?

@doggedOwl, I don't think the idea was to just change the words around, but to name it after the other package, hash/maphash.

DeedleFake avatar Dec 09 '25 16:12 DeedleFake

I just want to say that at least to me it's not clear that we want to use the complex type signature of #69559, so I'm wary of using that as an argument for using a complex type signature in this proposal.

I am also mildly in favor of sync/v2, or a different package entirely with a name like ConcurrentMap.

ianlancetaylor avatar Dec 09 '25 18:12 ianlancetaylor

... will delay the switch to the better map.

I don't see the argument here, but perhaps I'm missing something. From my perspective, it's going to require people to change their code either way.

I am not sure modernizers can do any heavy lifting here, because the modernizer would need to guarantee that only certain types are used in the map. This may end up being a global analysis. But if modernizers can help, then that also doesn't really help the argument either way: the translation would be mostly automatic anyway. The only point of contention is the end result (that is, naming).

Also this does not solve the same problem for the Pool.

No, but I think there are procedural and technical advantages to considering the changes in isolation, just as there are advantages to looking at the bigger picture.

My first (weakly held) argument against a sync/v2 is that updating Pool and Map is just simply not enough to warrant a v2 package, which typically represents a much larger reimagining of the package, and by its very nature, encourages doing so. My impression is that we have a general agreement that we shouldn't touch the rest of sync.

My second (weakly held) argument against a sync/v2 is that we gain very little from considering Pool and Map together. These types both benefit from a similar kind of change, but they're very different types with very different purposes.

I just want to say that at least to me it's not clear that we want to use the complex type signature of https://github.com/golang/go/issues/69559, so I'm wary of using that as an argument for using a complex type signature in this proposal.

I'd love to hear your perspective in more detail. I'm somewhat surprised to see this point of view on the subject come up as much as it has in the time the proposal has been public.

To put my thoughts on the table:

My understanding of the point of #70471 was exactly to enable these kinds of type signatures. Are you suggesting that maphash.Hasher is really just intended for non-standard-library maps? Or that there isn't a consensus and the proposal will be retracted? Or something else?

I am not sure what we gain by not supporting custom hashes. Yes, the signature is less complex, but it's trivial to define an additional alias that covers the common case while still supporting the general case (as I proposed as "optional" above).

I am also not sure what we lose by supporting the general case. The implementation is trivial (especially in this case), and the only API difference is an additional type parameter.

I am also mildly in favor of sync/v2, or a different package entirely with a name like ConcurrentMap.

I hadn't considered a third package, thanks!

sync is a very central location for many concurrent-safe primitives today, so one downside I see with a completely different package is discoverability, especially if modernizers can't help. If modernizers can help, then that point is moot.

mknyszek avatar Dec 09 '25 18:12 mknyszek

@mknyszek

Regarding making a hash type a type parameter to the map, my concern is simply the complexity of the type signature. I see that I may have been the first person to suggest it (https://github.com/golang/go/issues/70471#issuecomment-2489841363) but when I see it in practice I can only think of C++. And not in the best way.

For the specific case of sync.MapHash or whatever we call it, I think it will be rare to use a custom hasher (as you also say). So I think it would be entirely reasonable to use a different type sync.MapHashFunc that takes a custom hasher (which includes a custom equality function). And that hasher can be a value passed to New or something like that, it doesn't need to be a type parameter.

My concern boils down to: keep the code simple and easy to understand and to use. Don't walk down the path of programming with types by adding type parameters. Obviously that option is there for people who want it, but we don't need to do it in the standard library.

But this is just a style preference. I just want to make sure that we don't simply decide on that style preference without a broader discussion.

Separately, I don't see discoverability of a new package as a major concern, as the sync.Map docs can point to it. Further, it might make sense to have a package, or package tree, of concurrent-safe data structures.

ianlancetaylor avatar Dec 09 '25 22:12 ianlancetaylor

Regarding making a hash type a type parameter to the map, my concern is simply the complexity of the type signature.

Got it, thanks. I hear your concern on this point. My intent is just to stay in line with what other container APIs are going to do. If that's the way the winds are blowing, then sure, I can update the proposal to reflect that.

I had confused your point with having a custom hasher as an option at all.

But this is just a style preference. I just want to make sure that we don't simply decide on that style preference without a broader discussion.

I agree. I'm not trying to set the trend here, just follow it.

mknyszek avatar Dec 09 '25 23:12 mknyszek

I agree. I'm not trying to set the trend here, just follow it.

I updated the proposal with a note to this effect.

mknyszek avatar Dec 09 '25 23:12 mknyszek

This proposal was perhaps premature in this respect. I had misunderstood the conclusion of #70471: I'd thought a consensus was reached beyond just that library's API as to how hash-based containers would be expressed.

However, there's clearly no consensus on how maphash.Hasher should be used. And I guess I found that out very quickly.

#33502 doesn't really go into any detail. Perhaps this proposal was at least useful for surfacing these opinions. 😅

mknyszek avatar Dec 09 '25 23:12 mknyszek