memcached icon indicating copy to clipboard operation
memcached copied to clipboard

TAP mechanism

Open bigbes opened this issue 9 years ago • 0 comments

Overview

TAP provides a mechanism to observe from the outside data changes going on within a memcached server.

Use cases

TAP is a building block for lots of new types of things that would like to react to changes within a memcached server without having to actually modify memcached itself.

Replication

One simple use case is replication. Upon initial connect, a client can ask for all existing data within a server as well as to be notified as values change. We receive all data related to each item that's being set, so just replaying that data on another node makes replication an easy exercise.

Observation

Requesting a TAP stream of only future changes makes it very easy to see the types of things that are changing within your memcached instance.

Secondary Layer Cache Invalidation

If you have frontends that are performing their own cache, requesting a TAP stream of future changes is useful for invalidating items stored within this cache. Ideally, such a stream would not include the actual data that had changed. Today, all TAP streams include full bodies, but specifying new features that can be implemented by engines such as requesting the omission of values is very straightforward.

External Indexing

A TAP stream pointed at an index server (e.g. sphinx or solr) will send all data changes to the index allowing for an always-up-to-date full-text search index of your data.

VBucket Transition

For the purposes of VBucket transfer between nodes, a new type of TAP request can be created that is every item stored in a VBucket (or set of VBuckets) that is both existing and changing, but with the ability to terminate the stream and cut-over ownership of the VBucket once the last item is enqueued.

Protocol

A TAP session begins by initiating a command from the client which tells the server what we're interested in receiving and then the server begins sending /client/ commands back across the connection until the connection is terminated.

Initial Base Message from Client

A TAP stream begins with a binary protocol message with the ID of 0x40. The packet's key may specify a unique client identifier that can be used to allow reconnects (resumable at the server's discretion). A simple base message from a client referring to itself as "node1" would appear as follows.

  Byte            0            1            2            3  
 ------+------------+------------+------------+------------
     0         0x80         0x40         0x00         0x05  
     4         0x00         0x00         0x00         0x00  
     8         0x00         0x00         0x00         0x05  
    12         0x00         0x00         0x00         0x00  
    16         0x00         0x00         0x00         0x00  
    20         0x00         0x00         0x00         0x00  
    24   0x6e ('n')   0x6f ('o')   0x64 ('d')   0x65 ('e')  
    28   0x31 ('1')                                         
Field        (offset) (value)
Magic        (0)    : 0x80
Opcode       (1)    : 0x40
Key length   (2,3)  : 0x0005
Extra length (4)    : 0x00
Data type    (5)    : 0x00
Reserved     (6,7)  : 0x0000
Total body   (8-11) : 0x00000005
Opaque       (12-15): 0x00000000
CAS          (16-23): 0x0000000000000000
Extras              : None
Key          (24-29): The textual string: "node1"
Value               : None

Options

Additional TAP options may be specified as a 32-bit flags specifying options. The flags will appear in the "extras" section of the request packet. If omitted, it is assumed that all flags are 0.

Options may or may not have values. For options that do, the values will appear in the body in the order they're defined (LSB -> MSB).

Backfill

BACKFILL (0x01) contains a single 64-bit body that represents the oldest entry (from epoch) you're interested in. Specifying a time in the future (for the server you are connecting to), will cause it to start streaming only current changes. An example TAP stream request that specifies a backfill of -1 (meaning future only) would look like this:

  Byte            0            1            2            3  
 ------+------------+------------+------------+------------
     0         0x80         0x40         0x00         0x05  
     4         0x00         0x00         0x00         0x00  
     8         0x00         0x00         0x00         0x0c  
    12         0x00         0x00         0x00         0x00  
    16         0x00         0x00         0x00         0x00  
    20         0x00         0x00         0x00         0x00  
    24         0x00         0x00         0x00         0x00  
    28         0x00         0x00         0x00         0x01  
    32   0x6e ('n')   0x6f ('o')   0x64 ('d')   0x65 ('e')  
    36   0x31 ('1')         0x00         0x00         0x00  
    40         0x00         0xff         0xff         0xff  
    44         0xff                                         
Field        (offset) (value)
Magic        (0)    : 0x80
Opcode       (1)    : 0x40
Key length   (2,3)  : 0x0005
Extra length (4)    : 0x08
Data type    (5)    : 0x00
Reserved     (6,7)  : 0x0000
Total body   (8-11) : 0x00000015
Opaque       (12-15): 0x00000000
CAS          (16-23): 0x0000000000000000
Extras       (24-31): 0x0000000000000001
Key          (32-36): The textual string: "node1"
Value               : 0x00000000ffffffff

Dump

DUMP (0x02) contains no extra body and will cause the server to transmit only existing items and disconnect after all of the items have been transmitted. An example TAP stream request that specifies only dumping existing records would look like this:

  Byte            0            1            2            3  
 ------+------------+------------+------------+------------
     0         0x80         0x40         0x00         0x05  
     4         0x00         0x00         0x00         0x00  
     8         0x00         0x00         0x00         0x0c  
    12         0x00         0x00         0x00         0x00  
    16         0x00         0x00         0x00         0x00  
    20         0x00         0x00         0x00         0x00  
    24   0x6e ('n')   0x6f ('o')   0x64 ('d')   0x65 ('e')  
    28   0x31 ('1')         0x00         0x00         0x00  
    32         0x00         0xff         0xff         0xff  
    36         0xff                                         
Field        (offset) (value)
Magic        (0)    : 0x80
Opcode       (1)    : 0x40
Key length   (2,3)  : 0x0005
Extra length (4)    : 0x08
Data type    (5)    : 0x00
Reserved     (6,7)  : 0x0000
Total body   (8-11) : 0x0000000c
Opaque       (12-15): 0x00000000
CAS          (16-23): 0x0000000000000000
Extras       (24-31): 0x0000000000000002
Key          (32-36): The textual string: "node1"
Value               : None

VBucket List

LIST_BUCKETS (0x04) is used to limit a request to a specific set of VBuckets.

The VBuckets are included as values of 16-bits each, starting with a 16-bit number indicating the number of VBuckets in the list.

An example TAP stream request that specifies VBuckets 1, 2, and 5 would look like this:

  Byte            0            1            2            3  
 ------+------------+------------+------------+------------
     0         0x80         0x40         0x00         0x05  
     4         0x00         0x00         0x00         0x00  
     8         0x00         0x00         0x00         0x0c  
    12         0x00         0x00         0x00         0x00  
    16         0x00         0x00         0x00         0x00  
    20         0x00         0x00         0x00         0x00  
    24   0x6e ('n')   0x6f ('o')   0x64 ('d')   0x65 ('e')  
    28   0x31 ('1')         0x00         0x00         0x00  
    32         0x00         0xff         0xff         0xff  
    36         0xff         0x00         0x03         0x00  
    40         0x01         0x00         0x02         0x00  
    44         0x05                                         
Field        (offset) (value)
Magic        (0)    : 0x80
Opcode       (1)    : 0x40
Key length   (2,3)  : 0x0005
Extra length (4)    : 0x08
Data type    (5)    : 0x00
Reserved     (6,7)  : 0x0000
Total body   (8-11) : 0x0000000c
Opaque       (12-15): 0x00000000
CAS          (16-23): 0x0000000000000000
Extras       (24-31): 0x0000000000000004
Key          (32-36): The textual string: "node1"
Value               : {3, 1, 2, 5} (16-bits each)

Takeover VBuckets

TAKEOVER_VBUCKETS (0x08) is used to indicate that the client wishes to completely take the given VBuckets away from the server.

Support ACK

SUPPORT_ACK (`0x10~) indicates the client supports explicit ACKing. See ACK support for more information on this mechanism.

Request Keys Only

REQUEST_KEYS_ONLY (0x20) does pretty much what you'd think. It requests that the server does not send the values along with the keys. The server is not required to understand this so the client shouldn't assume it will be guaranteed to not receive values.

Response Commands

After initiating TAP, a series of responses will begin streaming commands back to the caller. These commands are similar to, but not necessarily the same as existing commands.

In particular, each command includes a section of engine-specific data as well as a TTL to avoid replication loops.

General Response Flags

The flag section of the packet has two defined flags that may be present for any given packet:

ACK

0x01 indicates the packet requests an ACK. The client must reply to this packet (i.e. send a packet back upstream with the same opaque) to acknowledge receipt of this packet.

NO_VALUE

0x02 indicates the item may have a value, but it is not included in the request.

Mutation

All mutation events arrive as TAP_MUTATION (0x41) events. These are conceptualy similar to set commands.

Delete - 0x42

Flush - 0x43

Opaque

0x44 - Engine specific extensions are sent as TAP opaque messages. A client may ignore any of these it doesn't understand (though must still honor the ACK if it the message has one).

VBucket Set

0x45 - When doing a takeover, this message indicates a VBucket state transition is ready.

ACK Support

In default mode, all events are streamed out of the server effectively blindly and as fast as possible. With ACKs, a client can more reliably receive messages by letting the server know at a higher protocol level that it has successfully processed TAP messages.

TCP, of course, guarantees message delivery, but a message can be delivered to a remote system that can crash before processing it. With ACKs enabled (and if supported by the server), the server can retransmit any messages whose ACKs were pending at the time of a connection drop if the same named connection reattaches to a TAP session that is still live.

The server chooses how frequently to send ACKs, and may dynamically adjust the interval at which it sends ACK requests to achieve maximum throughput. Clients need not make any assumptions about message ACKs other than any message requesting an ACK needs a response.

Implementations

  • spymemcached - supports a pretty rich API in tap in recent versions
  • ep-engine - has a python implementation that's used in a lot of utilities
  • gotap - is a working go implementation of tap, though has no real-world deployments.
  • Trond's libcouchbase - has support for tap.

P.S.

Original: https://code.google.com/p/memcached/wiki/Tap Motive to duplicate - google source code (with project wikis) is closing.

Plan

No plan yet

bigbes avatar Sep 25 '15 16:09 bigbes