Cache-Memcached-Fast Problem Getting "Big" Values

Migrated from rt.cpan.org #93140 (status was 'open')

Requestors:

[email protected]

From [email protected] on 2014-02-18 16:59:35 :

Hi there,

I have been playing around with a few Memcached client libraries recently and, ideally, would like to stick with Perl, but with decent performance. Memcached::Client is simple enough to use but performance isn't close Python or C# libraries offer.

Thinking it was probably due to its pure Perl nature, I then came across your library.

During the CPAN installation I ran into the following unit test issue: "Failed test 'Fetch# at t/big_value.t line 40."

With the Memcached server I am running, it's possible to up the default value-size limit from the default 1MB to 128MB. Doing so and setting the "max_size" parameter in your library I am able to set very large values without any problem. The problem comes when I try to get them out, it just doesn't return anything. The key'/value most definitely has been set as I can retrieve it with a get call from Memcached::Client.

I've placed a code sample here, rather simple but it might help: http://pastebin.com/QtBxqMhB

My setup:

Windows 2012
Strawberry Perl 5.16 64bits
Cache::Memcached::Fast v0.21
Memcached v1.4.5

I've had no problem using my Memcached server build with:

python-memcached
EnyimMemcached (C#)
Memcache::Client but it's worth nothing that Cache::Memcached throws up quite a bit of unit tests failures and therefore hasn't been tested.

Let me know if this makes any sense and/or if you need more information.

Thanks, Damien

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-18 18:00:46 :

On Tue Feb 18 11:59:35 2014, [email protected] wrote:

During the CPAN installation I ran into the following unit test issue: "Failed test 'Fetch# at t/big_value.t line 40."

This test has been written before memcached acquired -I option to raise 1MB limit. The test assumes that values slightly below 1MB will be accepted by the server, but slightly above will be rejected. My guess you run the tests on memcached -I 128m and value >1MB wasn't rejected by the server, hence the test failure.

With the Memcached server I am running, it's possible to up the default value-size limit from the default 1MB to 128MB. Doing so and setting the "max_size" parameter in your library I am able to set very large values without any problem. The problem comes when I try to get them out, it just doesn't return anything.

Can't reproduce. I ran memcached -I 128m -vv (version 1.4.17), adjusted your code to use 100000 * 1024 value, everything worked on Linux (I don't have access to any Windows host so can't test there):

$ perl /tmp/a.pl Setting keys with Fast.. done.

Getting keys with Fast.. data_short=1024 data_long=102400000

Getting keys with Client.. data_short=1024 data_long=102400000

Output from memcached: ... slab class 64: chunk size 134217728 perslab 1 ... <30 new auto-negotiating client connection 30: Client using the ascii protocol <30 set data_short 0 0 1024

30 STORED <30 set data_long 0 0 102400000 30 STORED <30 get data_short 30 sending key data_short 30 END <30 get data_long 30 sending key data_long 30 END <31 new auto-negotiating client connection 31: Client using the ascii protocol <31 get data_short 31 sending key data_short 31 END <31 get data_long 31 sending key data_long 31 END <30 connection closed. <31 connection closed.

Could you please provide the output of memcached -vv when C::M::F fails?

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-19 09:50:58 :

On 18 February 2014 18:00, Tomash Brechko via RT < [email protected]> wrote:

<URL: https://rt.cpan.org/Ticket/Display.html?id=93140 >

On Tue Feb 18 11:59:35 2014, [email protected] wrote:

During the CPAN installation I ran into the following unit test issue: "Failed test 'Fetch# at t/big_value.t line 40."

This test has been written before memcached acquired -I option to raise 1MB limit. The test assumes that values slightly below 1MB will be accepted by the server, but slightly above will be rejected. My guess you run the tests on memcached -I 128m and value >1MB wasn't rejected by the server, hence the test failure.

As per the big_value.t file it seems that line 40 isn't about setting something larger than 1MB (as this is on line 43) but about getting the regular test value ($value = 'x' x THRESHOLD). Which seems rather identical to the problem I experience.

With the Memcached server I am running, it's possible to up the default value-size limit from the default 1MB to 128MB. Doing so and setting the "max_size" parameter in your library I am able to set very large values without any problem. The problem comes when I try to get them out, it just doesn't return anything.

Can't reproduce. I ran memcached -I 128m -vv (version 1.4.17), adjusted your code to use 100000 * 1024 value, everything worked on Linux (I don't have access to any Windows host so can't test there):

$ perl /tmp/a.pl Setting keys with Fast.. done.

Getting keys with Fast.. data_short=1024 data_long=102400000

Getting keys with Client.. data_short=1024 data_long=102400000

Output from memcached: ... slab class 64: chunk size 134217728 perslab 1 ... <30 new auto-negotiating client connection 30: Client using the ascii protocol <30 set data_short 0 0 1024

30 STORED <30 set data_long 0 0 102400000 30 STORED <30 get data_short 30 sending key data_short 30 END <30 get data_long 30 sending key data_long 30 END <31 new auto-negotiating client connection 31: Client using the ascii protocol <31 get data_short 31 sending key data_short 31 END <31 get data_long 31 sending key data_long 31 END <30 connection closed. <31 connection closed.

Could you please provide the output of memcached -vv when C::M::F fails?

Sure, please find below the output:

... slab class 64: chunk size 134217728 perslab 1 ... <604 new auto-negotiating client connection 604: Client using the ascii protocol <604 set data_short 0 0 1024

604 STORED <604 set data_long 0 0 10240000 604 STORED <604 get data_short 604 sending key data_short 604 END <604 get data_long 604 sending key data_long 604 END <612 new auto-negotiating client connection 612: Client using the ascii protocol <612 get data_short 612 sending key data_short 612 END <612 get data_long 612 sending key data_long 612 END <612 connection closed. 608 END <608 connection closed.

Nothing seems out of the ordinary, except that the C::M::F get for "data_long" didn't actually return anything.

Thanks, Damien

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-19 12:05:41 :

On Wed Feb 19 04:50:58 2014, [email protected] wrote:

As per the big_value.t file it seems that line 40 isn't about setting something larger than 1MB (as this is on line 43) but about getting the regular test value ($value = 'x' x THRESHOLD). Which seems rather identical to the problem I experience.

Yup, you are right, I didn't check with the code. My final wild guess is that your server is too busy and memcached doesn't respond quite fast. Please set io_timeout parameter to zero to disable I/O timeout (default is 1 second), i.e.

my $memd_fast = Cache::Memcached::Fast->new ({ servers => [ 'localhost:11211' ], max_size => 10000 * 1024, io_timeout => 0, });

If this won't solve the problem then it would be interesting to know the minimum item size that reveals the problem and also if this size is always the same or varies slightly. However I don't have any further ideas.

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-19 12:11:29 :

On Wed Feb 19 07:05:41 2014, KROKI wrote:

My final wild guess is that your server is too busy and memcached doesn't respond quite fast. Please set io_timeout parameter to zero to disable I/O timeout

Actually, by default close_on_error is enabled, so if it was a timeout issue the connetion would be closed. However from your memcached output I don't see close event for 604 at all, which is puzzling.

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-19 12:31:05 :

On 19 February 2014 12:11, Tomash Brechko via RT < [email protected]> wrote:

<URL: https://rt.cpan.org/Ticket/Display.html?id=93140 >

On Wed Feb 19 07:05:41 2014, KROKI wrote:

My final wild guess is that your server is too busy and memcached doesn't respond quite fast. Please set io_timeout parameter to zero to disable I/O timeout

Actually, by default close_on_error is enabled, so if it was a timeout issue the connetion would be closed. However from your memcached output I don't see close event for 604 at all, which is puzzling.

The server has beefy specs and is otherwise idling, and the same behaviour was witnessed when client and server was run on my local machine which again is beefy and not overloaded at all.

I've re-run the same script but with only the data_short set and gets and the C::M::F get connection does close but later in time: <604 new auto-negotiating client connection 604: Client using the ascii protocol <604 set data_short 0 0 1024

604 STORED <604 get data_short 604 sending key data_short 604 END <612 new auto-negotiating client connection 612: Client using the ascii protocol <612 get data_short 612 sending key data_short 612 END <612 connection closed. <604 connection closed.

Running the Memcached server in -vvv I can see rows and rows of: 604: going from conn_read to conn_waiting 604: going from conn_waiting to conn_read 604: going from conn_read to conn_waiting 604: going from conn_waiting to conn_read 604: going from conn_read to conn_waiting 604: going from conn_waiting to conn_read 604: going from conn_read to conn_waiting 604: going from conn_waiting to conn_read 604: going from conn_read to conn_waiting

Which probably explains why the data isn't returned and C::M::F connection isn't closed in a timely fashion?

Thanks, Damien

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-19 13:35:18 :

On Wed Feb 19 07:31:05 2014, [email protected] wrote:

I've re-run the same script but with only the data_short set and gets and the C::M::F get connection does close but later in time: ... <612 connection closed. <604 connection closed.

I wonder how later the close of 604 happens, simply being after 612 is OK, any delay makes me think of io_timeout.

604: going from conn_waiting to conn_read 604: going from conn_read to conn_waiting

Which probably explains why the data isn't returned and C::M::F connection isn't closed in a timely fashion?

Those lines mean memcached server is reading (or trying to read) from the client. If they are output during the STORE then this is expected, if after GET then this is puzzling.

Please do one more try before giving up: set io_timeout to zero, do gets from Memcached::Client before gets from C::M::F, and with C::M::F do get(data_long) before get(data_short), and finally paste output of memcached -vv here. Upgrading memcached to the latest versions may also help (yes, I know that only C::M::F triggers the problem, but still ;)).

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-19 14:14:02 :

On 19 February 2014 13:35, Tomash Brechko via RT < [email protected]> wrote:

<URL: https://rt.cpan.org/Ticket/Display.html?id=93140 >

On Wed Feb 19 07:31:05 2014, [email protected] wrote:

604: going from conn_waiting to conn_read 604: going from conn_read to conn_waiting

Which probably explains why the data isn't returned and C::M::F connection isn't closed in a timely fashion?

Those lines mean memcached server is reading (or trying to read) from the client. If they are output during the STORE then this is expected, if after GET then this is puzzling.

They are definitely happening at the GET stage, not at the STORE. On the data_long C::M::F run it'll go on for ever, I let it go for a few minutes and it didn't stop. To be fair if it is trying to communicate with the client it's much too late at this point as we have exited a long time ago.

Please do one more try before giving up: set io_timeout to zero, do gets from Memcached::Client before gets from C::M::F, and with C::M::F do get(data_long) before get(data_short), and finally paste output of memcached -vv here. Upgrading memcached to the latest versions may also help (yes, I know that only C::M::F triggers the problem, but still ;)).

I've done as requested (first two gets are C::M::F), behaviour is the same as per below: <608 new auto-negotiating client connection 608: Client using the ascii protocol <608 set data_short 0 0 1024

608 STORED <608 set data_long 0 0 10240000 608 STORED <616 new auto-negotiating client connection 616: Client using the ascii protocol <616 get data_long 616 sending key data_long 616 END <616 get data_short 616 sending key data_short 616 END <608 get data_long 608 sending key data_long 608 END <620 new auto-negotiating client connection 620: Client using the ascii protocol <620 get data_short 620 sending key data_short 620 END <616 connection closed. <620 connection closed.

There doesn't seem to be official Windows builds for Memcached any more, and it's difficult to justify spending too much time in trying to build from source when there doesn't seem to be a known issue with this revision and other clients aren't having the same problem.

I understand if you can't devote more time to find this Windows specific problem, but could you perhaps point me to the places in the C code I should look first should I want to troubleshoot myself?

Thanks, Damien

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-19 14:19:43 :

On 19 February 2014 14:13, Damien Chaumette [email protected]wrote:

I've done as requested (first two gets are C::M::F)[..]

Sorry I meant the first two gets are with M::C and the second batch with C::M::F, as requested.

Thanks, Damien

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-19 15:32:28 :

Apologies for the spam, I've tried with a Couchbase Memcached bucket (they seem to provide up to date Windows Memcached) and get the same behaviour.

Thanks, Damien

On 19 February 2014 14:19, Damien Chaumette [email protected]wrote:

On 19 February 2014 14:13, Damien Chaumette [email protected]wrote:

I've done as requested (first two gets are C::M::F)[..]

Sorry I meant the first two gets are with M::C and the second batch with C::M::F, as requested.

Thanks, Damien

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-19 16:17:14 :

On Wed Feb 19 09:14:02 2014, [email protected] wrote:

They are definitely happening at the GET stage, not at the STORE. On the data_long C::M::F run it'll go on for ever, I let it go for a few minutes and it didn't stop. To be fair if it is trying to communicate with the client it's much too late at this point as we have exited a long time ago.

C::M::F client sends get request, reads the reply and sends nothing else after that. If I got you correctly then memcached server is trying to read something from the client, and I don't understand why.

I've done as requested (first two gets are C::M::F), behaviour is the same as per below:

What I see from the trace:

<608 new auto-negotiating client connection 608: Client using the ascii protocol <608 set data_short 0 0 1024

608 STORED <608 set data_long 0 0 10240000 608 STORED

C::M::F stored two keys in memcached.

<616 new auto-negotiating client connection 616: Client using the ascii protocol <616 get data_long

616 sending key data_long 616 END <616 get data_short 616 sending key data_short 616 END

Other client got two keys.

<608 get data_long

608 sending key data_long 608 END

C::M::F requested data_long and got some error at this point, closed the connection, and requested data_short via another connection:

<620 new auto-negotiating client connection 620: Client using the ascii protocol <620 get data_short

620 sending key data_short 620 END <616 connection closed. <620 connection closed.

Given that you disabled io_timeout I can only guess what error could happen, but C::M::F definitely got one, otherwise it wouldn't open another connection. So the question is why C::M::F thinks it got an error, and whether its perception is valid.

There doesn't seem to be official Windows builds for Memcached any more, and it's difficult to justify spending too much time in trying to build from source when there doesn't seem to be a known issue with this revision and other clients aren't having the same problem.

There's a tiny possibility that C::M::F specifically triggers some race in memcached server that other clients don't, and race fixes happen almost every memcached release (without detailed description what they could affect). But I got your last reply, the problem is reproduceable with the latest memcached.

I understand if you can't devote more time to find this Windows specific problem, but could you perhaps point me to the places in the C code I should look first should I want to troubleshoot myself?

First of all an easier way would be to use another module like Cache::Memcached::libmemcached (provided that it builds on Windows and works) - should be comparably fast. Debugging Perl modules in C is a pain even on Linux, and simply looking into the code won't reveal much I think because the problem somehow relates to your setup (I built memcached 1.4.5 here but couldn't reproduce). Value reading happens in src/client.c:read_value(). But if you really want to devote time to it, the first thing I would try is to capture network traffic to see actual packet contents (data_long is big, but protocol commands will always be at packet beginnings), and to trace system calls (on Linux we have strace utility that shows system calls and their results; don't know what Windows has).

Oct 01 '22 04:10 JRaspass

From [email protected] on 2014-02-20 09:26:47 :

I understand if you can't devote more time to find this Windows specific problem, but could you perhaps point me to the places in the C code I should look first should I want to troubleshoot myself?

First of all an easier way would be to use another module like Cache::Memcached::libmemcached (provided that it builds on Windows and works) - should be comparably fast. Debugging Perl modules in C is a pain even on Linux, and simply looking into the code won't reveal much I think because the problem somehow relates to your setup (I built memcached 1.4.5 here but couldn't reproduce). Value reading happens in src/client.c:read_value(). But if you really want to devote time to it, the first thing I would try is to capture network traffic to see actual packet contents (data_long is big, but protocol commands will always be at packet beginnings), and to trace system calls (on Linux we have strace utility that shows system calls and their results; don't know what Windows has).

I've just had a look at Cache::Memcached::libmemcached on CPAN although sadly it doesn't compile out of the box on my Strawberry Perl.. There also seems to be a few Couchbase Perl module on CPAN as well, with a bit of luck one of them will be in C and compile on Windows. I'll try to get C::M::libmemcached to work first, if not then a fast Couchbase module, failing that I may do some network sniffing around and/or look into Windows system calls.

Thanks, Damien

Oct 01 '22 04:10 JRaspass

From [email protected] on 2022-06-18 03:32:19 :

Hello,

I just noticed that src/client.c:read_value() returns 4294967295 where it should return -1. 4294967295 is 2^32 - 1.

My setup:

Windows 8.1
Strawberry Perl 5.32.1 64bits
Cache::Memcached::Fast v0.27
Memcached v1.6.9

Thnak you,

twata

Oct 01 '22 04:10 JRaspass

From [email protected] on 2022-06-18 03:44:29 :

Sorry. It was src/client.c:readv_restart(), not src/client.c:read_value() .

Oct 01 '22 04:10 JRaspass