smithy4s Expensive hashCode on Schema prevents fast cache lookups on deeply nested schemas

This was discussed briefly on Discord, but currently the usage of things like auto-derived Document.Decoder can be a footgun - in order to use the cache, we need to compute the hashCode of the schema being used. This is a problem, because for deeply nested schemas it can take a bit of time to traverse the entire thing, and it's not memoized in any way.

https://discord.com/channels/1045676621761347615/1045676622491164724/1336044966337843301

Here's one of my flame graphs:

Some of the ideas for improvement:

somehow use object identity as the cache key before resorting to hashCode (find a concurrent version of IdentityHashMap?)
use lazy val to store the hashcode in schemas
additionally, maybe use ConcurrentHashMap instead of TrieMap in the JVM case. The code is already platform-specific. This change would require benchmarking against the current state.

Feb 21 '25 17:02 kubukoz

additionally, maybe use ConcurrentHashMap instead of TrieMap in the JVM case. The code is already platform-specific. This change would require benchmarking against the current state.

Huh, I thought we already were. Well let's try that then ^^

Feb 24 '25 09:02 Baccata

That doesn't really help us if the hash computation itself is heavy :/

JVM - TrieMap https://github.com/disneystreaming/smithy4s/blob/series/0.18/modules/core/src-jvm/smithy4s/internals/maps.scala#L23

JS/Native - mutable.Map https://github.com/disneystreaming/smithy4s/blob/series/0.18/modules/core/src-js-native/smithy4s/internals/maps.scala#L23. I guess we'll have to change the Native one soon because multi-threading is now a thing.

Feb 24 '25 13:02 kubukoz