Binary `RedisCache` keys encoded to `String` can cause encoding issues
In org.springframework.data.redis.cache.RedisCache#clear, we generate the pattern like this:
byte[] pattern = conversionService.convert(createCacheKey("*"), byte[].class);
However, conversionService is for converting arbitrary key objects to a String, not for converting the resulting String to a byte[] (that is what cacheConfig.getKeySerializationPair() is for).
Instead, we should probably call createAndConvertCacheKey, like so:
byte[] pattern = createAndConvertCacheKey("*");
This is by intent. Consider a Key-serializer that uses JDK or JSON serialization. In that case, the key used for clearing the cache would map to a JDK or JSON representation that doesn't perform a prefix or postfix match in Redis since the message formats are not compatible.
How did you find out about the difference and did you encounter any issues with the current arrangement?
I am trying to use the Spring Data Redis cache implementation with binary keys. Specifically:
- The application calls
org.springframework.cache.Cache.put( key, ... ), - The key is serialized to binary data,
- The key which ultimately appears in Redis is a raw binary string, not representing and not interpretable as text in any encoding.
Redis has no problem with binary keys, but org.springframework.data.redis.cache.RedisCache makes this difficult because it internally uses a org.springframework.core.convert.ConversionService to go from an arbitrary Java key object to a String, followed by a org.springframework.data.redis.serializer.RedisSerializationContext.SerializationPair<String> to go from a String to a byte[]. This forces all keys to go through String, by being converted to text first, then to bytes second. It is in contrast to the handling of values, which are dealt with by a single RedisSerializationContext.SerializationPair<Object> that takes arbitrary Java object graphs directly to and from byte[], without intermediate conversion to String.
To work around this limitation in key handling, I came up with the following scheme:
- Write a
ConversionServiceto serialize key objects tobyte[](in my case using Kryo, but can be any serialization such as JDK, Protobuf, etc.), then encode them (in the sense of binary-as-text encoding like hex or base64) toString. - Write a
RedisSerializer<String>to recoverbyte[]by doing the reverse decoding.
This is inefficient and inelegant but does seem to work. My observation comes from having implemented my ConversionService to handle conversions to a String only, and being surprised when a conversion from String to byte[] was called for by RedisCache#clear. This did not match my understanding of how the key conversion service and key serializer work together.
I am confused by your examples. I don't believe that a Key-serializer that uses JDK or JSON serialization is possible, since the key serializer is a RedisSerializationContext.SerializationPair<String> that only converts character data to binary data and only performs serialization in a very restricted sense -- the sense of character set encoding of java.nio.charset.CharsetEncoder. As such:
- In the JSON example, the input to the key serializer is a
Stringcontaining (presumably) JSON text. The serialization from a POJO or DOM object graph must already have happened before the serializer (it must have been performed by theConversionService). The serializer can only be responsible for encoding the text tobyte[], again in the sense ofCharsetEncoder. - In the JDK serialization example, it seems possible to serialize
Stringtobyte[]by passing the JavaStringobject through JDK serialization. This however would be an unusual thing to do. In practice, a string would be serialized to bytes usinggetBytes( StandardCharsets.UTF_8 )or similar, which is again just character set encoding.
In summary:
- I would like for it to be easier to use binary keys with the Spring Data Redis cache implementation. This might be accomplished by using a
RedisSerializationContext.SerializationPair<Object>(rather than<String>) for keys, same as is done for values. This may then make theConversionServiceunnecessary. - If on the other hand the current two-step arrangement is to be kept, then perhaps
SerializationPair<String> keySerializationPaircan be replaced withCharsetEncoderfor greater clarity and without loss of functionality. This is possible because any actual key serialization must in practice be handled by theConversionService.
This forces all keys to go through String, by being converted to text first, then to bytes second.
Thanks a lot for the detail. You're right, forcing binary keys into String introduces encoding issues and we should reconsider the arrangement whether we can lift this limitation. During the rewrite of RedisCache between versions 1 and 2 we decided to use String keys as keys are in a lot of cases represented as strings and the API that allows filtering keys uses * as placeholder character. When using binary keys, * can appear in random places and lead to unwanted matches.
Using an intermediate representation (base64, hex) is a good workaround.
After revisiting our arrangement we've decided to keep the String-based approach. The key serializer allows customization of cache keys and the workaround of using hex/base64-encoding for the key (in case it is binary) allows the use of binary keys without tampering with the string encoding.