Expensive hashCode on Schema prevents fast cache lookups on deeply nested schemas
This was discussed briefly on Discord, but currently the usage of things like auto-derived Document.Decoder can be a footgun - in order to use the cache, we need to compute the hashCode of the schema being used. This is a problem, because for deeply nested schemas it can take a bit of time to traverse the entire thing, and it's not memoized in any way.
https://discord.com/channels/1045676621761347615/1045676622491164724/1336044966337843301
Here's one of my flame graphs:
Some of the ideas for improvement:
- somehow use object identity as the cache key before resorting to hashCode (find a concurrent version of
IdentityHashMap?) - use
lazy valto store the hashcode in schemas - additionally, maybe use
ConcurrentHashMapinstead ofTrieMapin the JVM case. The code is already platform-specific. This change would require benchmarking against the current state.
additionally, maybe use ConcurrentHashMap instead of TrieMap in the JVM case. The code is already platform-specific. This change would require benchmarking against the current state.
Huh, I thought we already were. Well let's try that then ^^
That doesn't really help us if the hash computation itself is heavy :/
JVM - TrieMap https://github.com/disneystreaming/smithy4s/blob/series/0.18/modules/core/src-jvm/smithy4s/internals/maps.scala#L23
JS/Native - mutable.Map https://github.com/disneystreaming/smithy4s/blob/series/0.18/modules/core/src-js-native/smithy4s/internals/maps.scala#L23. I guess we'll have to change the Native one soon because multi-threading is now a thing.