folsom icon indicating copy to clipboard operation
folsom copied to clipboard

Binary client doesn't obey TTL in some cases

Open flozano opened this issue 6 years ago • 14 comments

I have a wrapper for the cache layer that enforces a TTL on sets. Under some conditions which we haven't identified (Folsom 1.2.1, Oracle JDK1.8 in all cases, memcached 1.4.15 in Linux and memcached 1.5.13 in Mac), we have identified that TTL doesn't work properly in binary protocol, but works perfectly as expected for same test if I change the connection line from connectBinary to connectAscii. With connectBinary, it works well in some cases (a Ubuntu laptop with memcached 1.5.6) but not in others (Mac with memcached 1.5.13). This is the equivalent pseudo-code:

	@Test
	public void testExpiredKey() throws Exception {
		int ttl = 5;
		CacheClient<CachedValue> client = instanceClient(CachedValue.class, ttl);
		String key = randomKey();
		CachedValue value = randomValue();

		client.put(key, value).join();
		await().atMost(ttl * 2, TimeUnit.SECONDS).untilAsserted(() -> assertNull(client.get(key).join()));

	}

flozano avatar Feb 06 '19 08:02 flozano

Thanks for the report! I will look into it.

spkrka avatar Feb 06 '19 08:02 spkrka

To clarify - what kind of error do you see? Are entries expiring too late or too early?

spkrka avatar Feb 06 '19 09:02 spkrka

Entries not expiring at all - or maybe way too late for my tests to identify. My test just sets a key with a TTL, and expects a GET to that key to return null after the TTL passes (plus a substantial margin).

BTW I'm reproducing with pure folsom code also:

public class FolsomExpirationTest {

	private MemcacheClient<byte[]> memcacheClient;

	@Before
	public void setUp() throws Exception {
		memcacheClient = MemcacheClientBuilder.newByteArrayClient().withAddress("127.0.0.1").connectBinary();
		ConnectFuture.connectFuture(memcacheClient).toCompletableFuture().get(30, TimeUnit.SECONDS);
	}

	@After
	public void tearDown() {
		memcacheClient.shutdown();
	}

	@Test
	public void textExpiration() {
		int ttl = 5;
		String key = randomKey();
		byte[] value = randomValue();

		memcacheClient.set(key, value, ttl).toCompletableFuture().join();

		await().atMost(ttl * 2, TimeUnit.SECONDS)
				.untilAsserted(() -> assertNull(memcacheClient.get(key).toCompletableFuture().join()));
	}

	private static byte[] randomValue() {
		byte[] b = new byte[5];
		new Random().nextBytes(b);
		return b;
	}

	private static String randomKey() {
		return UUID.randomUUID().toString();
	}
}

changing connection to connectAscii makes test pass, whereas connectBinary fails consistently here.

flozano avatar Feb 06 '19 09:02 flozano

Passing consistently on:

  • Ubuntu 18.04, Oracle JDK 1.8.0_201, Memcached 1.5.6
  • OS X 10.14.2, Oracle JDK 1.8.0_66 (Laptop 2), Memcached 1.5.8 and 1.5.12

Failing consistently on:

  • OS X 10.14.2, Oracle JDK 1.8.0_181 (Laptop 1), Memcached 1.5.12

Randomly failing/passing on:

  • AWS Linux 4.14.77-70.82.amzn1.x86_64, Oracle JDK 1.8.0_162, Memcached 1.4.15

flozano avatar Feb 06 '19 09:02 flozano

I could reproduce it by changing the memcached version, see: https://github.com/spotify/folsom/pull/113

It looks more like a regression in memcached itself than a problem in folsom.

spkrka avatar Feb 06 '19 09:02 spkrka

I see. It's possible, but it seems to fail too in 1.4.15 randomly... that's why we initially discarded memcache version (1.4.15+Linux failing sometimes, 1.5.6+Linux passing, 1.5.12+Mac failing).

flozano avatar Feb 06 '19 09:02 flozano

Testing with more versions of memcached, it seems to depend more on type of image than version. Fails on ol-7 but works on debian / regular builds.

According to specification:

Some commands involve a client sending some kind of expiration time
(relative to an item or to an operation requested by the client) to
the server. In all such cases, the actual value sent may either be
Unix time (number of seconds since January 1, 1970, as a 32-bit
value), or a number of seconds starting from current time. In the
latter case, this number of seconds may not exceed 60*60*24*30 (number
of seconds in 30 days); if the number sent by a client is larger than
that, the server will consider it to be real Unix time value rather
than an offset from current time.

A workaround from our side could be to always send it as a unix timestamp if possible (carefully trying to avoid the year 2038 problem)

spkrka avatar Feb 06 '19 09:02 spkrka

I see. Thanks for the input. I'll move all my code to use timestamps instead.

flozano avatar Feb 06 '19 09:02 flozano

Actually in the binary version of the SetRequest command, this is done:

  public ByteBuf writeRequest(final ByteBufAllocator alloc, final ByteBuffer dst) {
    final int expiration = Utils.ttlToExpiration(ttl);

so timestamps are already in-use...

BTW I'm not sure this code is correct either:

  public static int ttlToExpiration(final int ttl) {
    return (ttl == 0) ? 0 : (int) (System.currentTimeMillis() / 1000) + ttl;
  }

as it doesn't take into account if the provided "ttl" is already a timestamp.

(ASCII version uses TTL as-is, whereas binary enforces passing of TTL. I think the ttlToExpiration should check if the provided ttl is already above 606024*30 before doing that operation)

flozano avatar Feb 06 '19 09:02 flozano

With ASCII and timestamps obtained thru Utils.tottlToExpiration it also fails...

flozano avatar Feb 06 '19 09:02 flozano

Good catch! I'll take a look at that.

spkrka avatar Feb 06 '19 09:02 spkrka

The provided ttl should not be a timestamp - we could extend the API to be more explicit, like adding .set(key, value, instant) or .set(key, value, duration)

spkrka avatar Feb 06 '19 10:02 spkrka

Does this change make sense to start with? Do you think it would solve your issues? https://github.com/spotify/folsom/pull/114

spkrka avatar Feb 06 '19 10:02 spkrka

Partially solved in release 1.3.0.

spkrka avatar Feb 06 '19 12:02 spkrka