CacheTower icon indicating copy to clipboard operation
CacheTower copied to clipboard

Upgrading from v0.11.x to Custom Serializers with RedisCacheLayer breaks existing caches

Open mgoodfellow opened this issue 3 years ago • 30 comments

What does the bug do?

The new Protobuf serializer breaks upgrade path from 0.11.x to latest when used in conjunction with the Redis layer

ProtoBuf.ProtoException: Invalid wire-type (Varint); this usually means you have over-written a file without truncating or setting the length; see https://stackoverflow.com/q/2152978/23354
   at ProtoBuf.ProtoReader.State.ThrowProtoException(String message) in /_/src/protobuf-net.Core/ProtoReader.State.ReadMethods.cs:line 770
   at ProtoBuf.ProtoReader.State.ThrowWireTypeException() in /_/src/protobuf-net.Core/ProtoReader.State.ReadMethods.cs:line 763
   at proto_8(State& , CachedFeature )
   at ProtoBuf.ProtoReader.State.ReadMessage[TSerializer,T](SerializerFeatures features, T value, TSerializer& serializer)
   at proto_6(State& , CacheEntry`1 )
   at ProtoBuf.ProtoReader.State.ReadAsRoot[T](T value, ISerializer`1 serializer)
   at ProtoBuf.ProtoReader.State.DeserializeRoot[T](T value, ISerializer`1 serializer)
   at ProtoBuf.Serializer.Deserialize[T](Stream source)
   at CacheTower.Providers.Redis.RedisCacheLayer.GetAsync[T](String cacheKey)
   at CacheTower.CacheStack.GetWithLayerIndexAsync[T](String cacheKey)
   at CacheTower.CacheStack.GetOrSetAsync[T](String cacheKey, Func`2 valueFactory, CacheSettings settings)
   at CacheTower.CacheStack`1.GetOrSetAsync[T](String cacheKey, Func`3 getter, CacheSettings settings)

How can it be reproduced?

Generate some caches in a redis layer on 0.11.3 Update CacheTower to latest Read caches

Diagnostic

I believe the issue stems from the fact that historically this wrapper was used when persisting items to the redis cache layer

https://github.com/TurnerSoftware/CacheTower/pull/172/files#diff-972f6f8307717fcd20b6599842d83ab73931e2aa4b483e5076ab39d5f5109e0dL13

Now, CacheEntry'T is used directly.

I'm not quite sure how this type serializes without the ProtoXXX attributes on the class, but magically it does. Unfortunately, it isn't binary compatible with RedisCacheEntry and things go bang.

Obviously we could consider purging the cache to solve the issue, but the problem is more significant for us as we have a wide range of services, and we would have to take everything offline, update them all, purge old caches, then bring it all back up again in order to update.

Any ideas would be appreciated!

mgoodfellow avatar Oct 12 '22 10:10 mgoodfellow

I deserialized the protobuf direct from Redis:

0.11.3:

[
    {
        "1": [
            {
                "1": 3331145234
            },
            {
                "2": 3
            }
        ]
    },
    {
        "2": [
            {
                "1": "Some data"
            },
        ]
    }
]

Latest:

[
    {
        "1": [
            {
                "1": "Some data"
            },
        ]
    },
    {
        "2": [
            {
                "1": 3331144154
            },
            {
                "2": 3
            }
        ]
    }
]

Sadly the cache expiry data and the payload have swapped positions :(

mgoodfellow avatar Oct 12 '22 11:10 mgoodfellow

Thanks for raising this and wow, that is a subtle but annoying bug. Is it throwing an exception when deserializing?

If I can't match up the protos, it should be failing gracefully as a cache miss.

Outside of that, if I can change the proto back to the structure before, it will be broken the other way now. If I can find a solution where we can gracefully fail, that would be best.

Turnerj avatar Oct 12 '22 12:10 Turnerj

Yes, gracefully failing would be ideal.

Stack trace for the deserialise flow is above - this is the only exception I saw - a write/serialize would be fine as it is overwriting any existing data. I guess try/catch in the serializer itself is best, otherwise it could be part of the cache layer itself. The latter is better if we want to purge the value on failed deserialize.

Unfortunately from a performance point of view this will have an impact, although mostly only when it fails. The main issue is you will keep failing until the cache value is removed/overwritten, so the cache fetch / deserialize flow needs to be able to wipe the broken value from the cache to avoid a high rate of exceptions being thrown until the value expires.

Unfortunately that couples the read/write flows and makes everything a bit messy if we aren't careful.

mgoodfellow avatar Oct 12 '22 12:10 mgoodfellow

Just checked my PR, I got the proto values right so this is even more strange.

Turnerj avatar Oct 12 '22 12:10 Turnerj

Latest:

https://github.com/TurnerSoftware/CacheTower/blob/main/src/CacheTower.Serializers.Protobuf/ProtobufCacheSerializer.cs#L21

This is saying 1 - file data, 2 - expiry data?

Pre PR it was:

https://github.com/TurnerSoftware/CacheTower/pull/172/files#diff-972f6f8307717fcd20b6599842d83ab73931e2aa4b483e5076ab39d5f5109e0dL18

1 - Expiry, 2 - data

It was swapped :(

mgoodfellow avatar Oct 12 '22 12:10 mgoodfellow

I'm not quite sure how this type serializes without the ProtoXXX attributes on the class, but magically it does

Aha, now I know how that bit of magic works - that's cool, nice way to get around needing to provide the attributes in the class - really clean!

mgoodfellow avatar Oct 12 '22 12:10 mgoodfellow

Yeah - so that bit of magic, the numeric values match up for the attributes I had defined so it shouldn't be switch but yeah, you're seeing it so it is.

Before: https://github.com/TurnerSoftware/CacheTower/blob/082e9cd2b327c7b42d34068081f9105d31337703/src/CacheTower.Providers.Redis/Entities/RedisCacheEntry.cs

After: https://github.com/TurnerSoftware/CacheTower/blob/a4ec253e485fd25a6de08d7ed74049e52b71657a/src/CacheTower.Serializers.Protobuf/ProtobufCacheSerializer.cs#L39-L41

Turnerj avatar Oct 12 '22 12:10 Turnerj

Oh no!

image

It's ... both?

mgoodfellow avatar Oct 12 '22 12:10 mgoodfellow

Sorry, ignore me, that's a different type 🤦

mgoodfellow avatar Oct 12 '22 12:10 mgoodfellow

ManifestEntry is used for the file cache layer - that shouldn't be impacting what we're seeing here.

Turnerj avatar Oct 12 '22 12:10 Turnerj

I guess in the short term I could swap in my own version of the ProtobufSerializer which tries to deserialize one way, then tries the other if the first fails. This could remain in place until all the keys have migrated over and then the normal one swapped back in.

More worrying is why they have swapped? I have read it a few times, and I cannot see why this has been swapped. When I have a moment later I will try to setup a test to prove it. In theory the test is simple - serialize using your ProtobufSerializer, then deserialize using the old RedisCacheValue class using attributes. This test should fail as it stands now, although I am unsure as to the cause, but at least there is something to prove it and it's not something daft in my setup.

mgoodfellow avatar Oct 12 '22 12:10 mgoodfellow

Yeah, I'm puzzled too. What you described as a workaround sounds plausible but it sucks that you need to do that.

Turnerj avatar Oct 12 '22 12:10 Turnerj

There is a bump in protobuf versions between 0.11.x to latest - may have been a bug introduced there? Previous: https://www.nuget.org/packages/CacheTower.Providers.Redis/0.11.3#dependencies-body-tab Latest: https://www.nuget.org/packages/CacheTower.Serializers.Protobuf/#dependencies-body-tab

Turnerj avatar Oct 12 '22 12:10 Turnerj

image :(

Turnerj avatar Oct 12 '22 13:10 Turnerj

I had only just got around to pulling the repo by the time you had already written the test - thanks for that ❤️ - bittersweet to see the result though, at least it isn't me breaking something, but now something totally weird is happening with the protobuf library.

Next thing I would quickly check is if you define the RuntimeTypeConfig specifically with CacheEntry<string> rather than the open-generic workaround. Then we have reduced it to it's simplest form?

mgoodfellow avatar Oct 12 '22 13:10 mgoodfellow

This is with a slightly newer version of the Protobuf library (so I haven't isolated whether it is a bug there or in my code) but this works:

image

Turnerj avatar Oct 12 '22 13:10 Turnerj

Next thing I would quickly check is if you define the RuntimeTypeConfig specifically with CacheEntry<string> rather than the open-generic workaround. Then we have reduced it to it's simplest form?

Technically it isn't an open generic, that's what the magic is actually trying to fix. The T type you see that we pass into the RuntimeTypeModel is the CacheEntry<YourType>.

Turnerj avatar Oct 12 '22 13:10 Turnerj

image

Turnerj avatar Oct 12 '22 13:10 Turnerj

Oh, one thing with the test that I through together is wrong - CacheEntry<T> will truncate an expiry down to the second so it may have given a false positive. Putting another test together now...

Turnerj avatar Oct 12 '22 13:10 Turnerj

It is the missing set on the property that is breaking it...

Turnerj avatar Oct 12 '22 13:10 Turnerj

image image

Turnerj avatar Oct 12 '22 13:10 Turnerj

Wow great spot - really good find!

Strange it serializes as surely it wouldn't match the RuntimeConfig rather than just ignoring it entirely and serializing in the wrong positions. I would have expected it to get upset similar to if you don't have the ProtoXXX attributes present at all.

Difficult position to get out of now 😬

mgoodfellow avatar Oct 12 '22 13:10 mgoodfellow

More curious... image image

Turnerj avatar Oct 12 '22 13:10 Turnerj

🤯

So the setter is required for the RuntimeTypeModel configuration approach, but not required for the attribute approach it seems?

mgoodfellow avatar Oct 12 '22 13:10 mgoodfellow

Potentially, yeah... I'll need to dig into this more tomorrow but that is a crazy bug if that is the case.

Turnerj avatar Oct 12 '22 13:10 Turnerj

image

An even more simplified example causing the problem - it definitely seems to be an upstream bug that I inadvertently triggered.

Turnerj avatar Oct 13 '22 07:10 Turnerj

It gets weirder, both properties don't need to have set either - just one "fixes" this. Guessing it is definitely impacting the ordering then.

image image image

Turnerj avatar Oct 13 '22 07:10 Turnerj

Opened up a bug report for this on protobuf-net: https://github.com/protobuf-net/protobuf-net/issues/964

Turnerj avatar Oct 13 '22 07:10 Turnerj

From Marc Gravell:

The immediate problem here is that it is inferring the member order as a tuple-like type - meaning: when it finds no attributes (in RuntimeTypeModel.Add), it is looking for a constructor that sets everything, finding the TestClass(string value, DateTime expiry) constructor, and deciding "value", "expiry". An immediate workaround here is to pass the applyDefaultBehaviour: false parameter to Add, which disables all type inspections; with this change alone: everything works as intended.

So there is a default behaviour being applied which is taking precedence to the configuration I'm setting. He's put together a PR to make the way I'm doing it throw an exception to prevent falling into this problem in the future.

I've got two changes for Cache Tower then:

  • Disable the default behaviour that protobuf is using (pretty simple)
  • Allow deserialization failures to be handled gracefully (a little more complex)

With just the first part alone, it would fix the situation for you however anyone that started on newer versions would now have the reverse happen.

I can't promise getting these changes done in the next few weeks (got a few important commitments I need to handle) but at least we know what is going on.

Turnerj avatar Oct 13 '22 12:10 Turnerj

Very interesting! Thanks for the investigation, glad we have an answer for the strangeness now.

There isn't any rush for the changes, I was only just getting around to upgrading (finally!) from 0.11.3! Thanks for taking the time to look at the issue!

mgoodfellow avatar Oct 13 '22 14:10 mgoodfellow