active_model_serializers Caching doesn't improve performance

Expected behavior vs actual behavior

Expected: Configure a cache and using the AMS serializer cache method should improve rendering performance.

Actual: performance decreases AND more objects are allocated.

Steps to reproduce

current master: git co fa0bc95.

 bin/bench
caching on: caching serializers: gc off 606.0970710386515/ips; 1853 objects
caching off: caching serializers: gc off 526.5338285238549/ips; 1853 objects
caching on: non-caching serializers: gc off 709.8031139840541/ips; 1390 objects
caching off: non-caching serializers: gc off 746.4513428127035/ips; 1390 objects
Benchmark results:
{
  "commit_hash": "fa0bc95",
  "version": "0.10.0.rc4",
  "benchmark_run[environment]": "2.2.3p173",
  "runs": [
    {
      "benchmark_type[category]": "caching on: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 606.097,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1853
    },
    {
      "benchmark_type[category]": "caching off: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 526.534,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1853
    },
    {
      "benchmark_type[category]": "caching on: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 709.803,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1390
    },
    {
      "benchmark_type[category]": "caching off: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 746.451,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1390
    }
  ]
}

CACHE_ON=false bin/bench
caching on: caching serializers: gc off 664.8712562099971/ips; 1853 objects
caching off: caching serializers: gc off 613.6203762167032/ips; 1853 objects
caching on: non-caching serializers: gc off 752.267454951568/ips; 1390 objects
caching off: non-caching serializers: gc off 692.4981276214933/ips; 1390 objects
Benchmark results:
{
  "commit_hash": "fa0bc95",
  "version": "0.10.0.rc4",
  "benchmark_run[environment]": "2.2.3p173",
  "runs": [
    {
      "benchmark_type[category]": "caching on: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 664.871,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1853
    },
    {
      "benchmark_type[category]": "caching off: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 613.62,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1853
    },
    {
      "benchmark_type[category]": "caching on: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 752.267,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1390
    },
    {
      "benchmark_type[category]": "caching off: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 692.498,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1390
    }
  ]
}

Number vary somewhat over runs but differences are consistent.

Environment

ActiveModelSerializers Version 0.10.0.rc4, on ref fa0bc95
ruby -e "puts RUBY_DESCRIPTION"
- ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-darwin14]

OS Type & Version:

uname -a
- Darwin mbp14 14.5.0 Darwin Kernel Version 14.5.0: Tue Sep 1 21:23:09 PDT 2015; root:xnu-2782.50.1~1/RELEASE_X86_64 x86_64: Yosemite 10.10.15 Integrated application and version
bundle show activemodel
- .bundle/ruby/2.2.0/gems/activemodel-4.0.13

Backtrace

N/A

Additional helpful information

https://blog.codeship.com/building-a-json-api-with-rails-5/
- By making these changes, we’ve changed our response time from 30ms to 50ms… wait, what? Yes, you heard me right. By adding cache, responses in my application have actually slowed down.
- https://twitter.com/leighchalliday/status/642734572703236096 and https://twitter.com/joaomdmoura/status/642801896231727104
- By looking at the flame graph with caching turned on, I could tell that 48 percent of the time was spent in the cache_check method or farther down in the stack trace. This seems to account for the slowdown from 30ms to 50ms. active_model_serializers-258f116c3cf5/lib/active_model/serializer/adapter.rb:110:incache_check'` (48 samples - 48.00%) Here’s an image of the flamegraph, which was produced by using rack mini profiler gem with the flamegraph gem. I’ve highlighted in black the portion that’s dealing with the cache.

Cache developments since then:

We now support read_multi

However:

before:

{
  "commit_hash": "43312fa^",
  "version": "0.10.0.rc3",
  "benchmark_run[environment]": "2.2.2p95",
  "runs": [
    {
      "benchmark_type[category]": "caching on: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 687.045,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1426
    },
    {
      "benchmark_type[category]": "caching off: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 688.588,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1426
    },
    {
      "benchmark_type[category]": "caching on: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 849.889,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1084
    },
    {
      "benchmark_type[category]": "caching off: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 769.596,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1084
    }
  ]
}

after:

{
  "commit_hash": "43312fa",
  "version": "0.10.0.rc3",
  "benchmark_run[environment]": "2.2.2p95",
  "runs": [
    {
      "benchmark_type[category]": "caching on: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 635.297,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1519
    },
    {
      "benchmark_type[category]": "caching off: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 601.3,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1519
    },
    {
      "benchmark_type[category]": "caching on: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 782.07,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1113
    },
    {
      "benchmark_type[category]": "caching off: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 771.094,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1113
    }
  ]
}

So maybe we should take a look at usage in bulk_cache_fetcher

And with more objects it gets worse:

BENCH_STRESS=true bin/bench Benchmark results:

{
  "commit_hash": "e03c5f5",
  "version": "0.10.0.rc4",
  "benchmark_run[environment]": "2.2.3p173",
  "runs": [
    {
      "benchmark_type[category]": "caching on: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 164.688,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 10755
    },
    {
      "benchmark_type[category]": "caching off: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 143.719,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 10755
    },
    {
      "benchmark_type[category]": "caching on: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 232.669,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 6690
    },
    {
      "benchmark_type[category]": "caching off: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 211.71,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 6690
    }
  ]
}

Possibly related

Flamegraph

Flamegraph of master with bin/serve_benchmark start and the flamegraph gem

iff --git a/Gemfile b/Gemfile
index 3791eef..7be3d53 100644
--- a/Gemfile
+++ b/Gemfile
@@ -39,6 +39,8 @@ gem 'tzinfo-data', platforms: (@windows_platforms + [:jruby])
 group :bench do
   # https://github.com/rails-api/active_model_serializers/commit/cb4459580a6f4f37f629bf3185a5224c8624ca76
   gem 'benchmark-ips', require: false, group: :development
+  gem 'rack-mini-profiler', require: false
+  gem 'flamegraph'
 end

 group :test do
diff --git a/test/benchmark/app.rb b/test/benchmark/app.rb
index ae110ec..ffbc8cc 100644
--- a/test/benchmark/app.rb
+++ b/test/benchmark/app.rb
@@ -54,6 +54,14 @@ end

 require 'active_model_serializers'

+begin
+    require 'rack-mini-profiler'
+rescue LoadError # rubocop:disable Lint/HandleExceptions
+else
+  require 'flamegraph'
+  # just append ?pp=flamegraph
+end
+
 # Initialize app before any serializers are defined, for running across revisions.
 # ref: https://github.com/rails-api/active_model_serializers/pull/1478
 Rails.application.initialize!

Mar 13 '16 06:03 bf4

Apparently @joaomdmoura had already discussed this in https://github.com/rails-api/active_model_serializers/issues/1020. I missed this since the issue title was 'Understanding caching', but the contents were that caching made things worse. So, this has been a known issue since July 2015. Sigh.

Mar 13 '16 17:03 bf4

On interpreting Flamegraphs http://community.miniprofiler.com/t/how-to-deal-with-information-overload-in-flamegraphs/437?u=sam

Mar 22 '16 14:03 bf4

Note: this benchmark is faulty since some legacy AMS idiosyncrasy made it so that the cached serializer actually did twice the work. Could you re-run @bf4?

Apr 20 '16 13:04 beauby

Sure. Related to my reference to https://github.com/rails-api/active_model_serializers/pull/1478 above, I'd like to remove per-serializer cache_store configuration. I just don't see the benefit for the complexity it adds.

Apr 20 '16 14:04 bf4

For updated benchmarks, see #1698

Apr 20 '16 16:04 bf4

@bf4 I picked this issue since it might be related to performance boosts. Are there any plans to come up with something along the lines of ActiveRecord's preload for serializers? I see a lot of has_one relations that could be streamlined in a single call not using multi fetches. I also see some Collection serializations not using it as well in my tests (0.10.0 release, not master).

Jun 16 '16 13:06 zaaroth

@zaaroth there's improvements in master which I just released in 0.10.1. would love to discuss in amserializers.herokuapp.com slack. Thanks!

Jun 16 '16 14:06 bf4

Is there any update on this issue ? Does caching improve performance now ?

Oct 25 '16 00:10 GCorbel

@GCorbel Caching will only improve performance when serializing objects with computation-heavy attributes. I'm not aware of the current status of it though, although last I heard @bf4 had fixed it.

Oct 25 '16 01:10 beauby

@bf4 Is this still an issue? It's still listed as a warning in the caching guide in master, but I can't seem to find any updates.

Jan 04 '17 19:01 stephendolan

It's much better thqn when I first made the issue, but I'm not yet satisfied to close it until I have some benchmarks

B mobile phone

On Jan 4, 2017, at 2:34 PM, Stephen Dolan [email protected] wrote:

@bf4 Is this still an issue? It's still listed as a warning in the caching guide in master, but I can't seem to find any updates.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Jan 04 '17 21:01 bf4

Wondering if this is till an issue, can't find to much information around this.

Mar 12 '17 13:03 mustela

@mustela Basically, I haven't yet found an app I can test performance against in a way that make me comfortable changing the language. The perf is much improved since I made the issue, but between life and lacking a good test example, I just haven't followed up.

Mar 12 '17 19:03 bf4

@bf4 would you mind to describe the app you are looking for? Maybe we can help with that.

Mar 12 '17 20:03 mustela

@mustela Probably the simplest thing to extract would be a setup with

records
db not necessarily sqlite
endpoint which serializes records, or somehow uses AMS
Serializers have caching (a simple superclass can have it off, and subclass on for benchmarking purposes, or I can turn it on or off, no problem there)
cache_store is set; is neither memory nor filesystem. (e.g. redis)
Integration tests/request specs are probably a good starting point

Mar 13 '17 01:03 bf4

I created a simple app to exercise the issue I was facing. https://github.com/customink/amstest

Apr 03 '17 13:04 metaskills

@metaskills Thanks so much for this! Added an issue there https://github.com/customink/amstest/issues/1

Apr 03 '17 22:04 bf4

I can say that in my case caching worked like a charm. It saved me about 93% of serialization time. Using AMS 0.10.5. I'm serializing a lot of data though.

Before Caching: screen shot 2017-04-12 at 11 08 30 am

After Caching: screen shot 2017-04-12 at 11 08 48 am

(Images are from Skylight.io)

Apr 12 '17 18:04 mrsweaters

@mrsweaters Fantastic! are you able to describe in general terms the nature of what you're serializing such that I can model it? like db tables, fields, indices, associations, number of items, how you've configured your serializers, etc?

Apr 12 '17 19:04 bf4

@bf4 I had to temporarily disable caching unfortunately because Carrierwave can't be serialized. Once I find a workaround I'll try to summarize my situation.

Apr 14 '17 23:04 mrsweaters

@mrsweaters Do you have overridden attributes that are costly to compute?

Apr 15 '17 12:04 beauby

@bf4 - I see how caching improves situations in testapp provided by metaskills where the controller is busy doing some computation.

However, in my case - where I try to serialize 10,000 records or so, it is still faster to regenerate json than fetch from memcached or redis. The sample app I used for this test was pretty straightforward with a model having 5 attributes, no relationships. Is this expected?

Apr 16 '17 21:04 harbirg

However, in my case - where I try to serialize 10,000 records or so, it is still faster to regenerate json than fetch from memcached or redis

I saw this too. Basically it was due to excessive children caching and and poor support for russian doll strategies and/or read multi. We solved that first by caching at the top layer only, then moving to JBuilder and a solution with read multi support.

Apr 16 '17 21:04 metaskills

I do see "[active_model_serializers] Cache read_multi: [" ...entries from dalli memcached ..."] in my output, so I'm assuming this means that its performing multi_read. Perhaps, the outstanding issue is something like Russian Doll strategy for caching where AMS would cache both individual entries and the entire response.

Apr 16 '17 22:04 harbirg

Hey everyone,

I'm trying to understand how cache works for AMS since I'm not sure if this is a bug or how it works but I've made a simple Rails API with a basic configuration: https://github.com/mustela/ams-cache.

The schema is simple: User => Memberships => Organizations

Cache enabled, Serializers and the controller which is returning the user + the organizations.

So basically when I request curl -X GET localhost:3000/users/1

Started GET "/users/1" for ::1 at 2017-04-19 23:29:07 +0200
Processing by UsersController#show as */*
  Parameters: {"id"=>"1"}
  User Load (0.6ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1
[active_model_serializers]   Organization Load (0.7ms)  SELECT "organizations".* FROM "organizations" INNER JOIN "memberships" ON "organizations"."id" = "memberships"."organization_id" WHERE "memberships"."user_id" = 1 ORDER BY "organizations"."name" ASC
[active_model_serializers] Rendered UserSerializer with ActiveModelSerializers::Adapter::Attributes (5.12ms)
Completed 200 OK in 7ms (Views: 4.9ms | ActiveRecord: 1.3ms)

This is the response that I get every time I call that endpoint. I understand that user is being read to create the cache key, but the organizations are not being cached.

Postgres transaction log, for every request:

db_1               | LOG:  execute <unnamed>: SELECT  "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1
db_1               | LOG:  execute <unnamed>: SELECT "organizations".* FROM "organizations" INNER JOIN "memberships" ON "organizations"."id" = "memberships"."organization_id" WHERE "memberships"."user_id" = 1 ORDER BY "organizations"."name" ASC
db_1               | LOG:  execute <unnamed>: SELECT  "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1
db_1               | LOG:  execute <unnamed>: SELECT "organizations".* FROM "organizations" INNER JOIN "memberships" ON "organizations"."id" = "memberships"."organization_id" WHERE "memberships"."user_id" = 1 ORDER BY "organizations"."name" ASC
db_1               | LOG:  execute <unnamed>: SELECT  "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1
db_1               | LOG:  execute <unnamed>: SELECT "organizations".* FROM "organizations" INNER JOIN "memberships" ON "organizations"."id" = "memberships"."organization_id" WHERE "memberships"."user_id" = 1 ORDER BY "organizations"."name" ASC
db_1               | LOG:  execute <unnamed>: SELECT  "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1
db_1               | LOG:  execute <unnamed>: SELECT "organizations".* FROM "organizations" INNER JOIN "memberships" ON "organizations"."id" = "memberships"."organization_id" WHERE "memberships"."user_id" = 1 ORDER BY "organizations"."name" ASC

In redis the keys are being saved:

localhost:6379> keys *
1) "organizations/1-20170419212733959570/attributes/a74db0c5f71a4f9513eb81e760b03d2c"
2) "users/1-20170419212733980160/attributes/adcc32fd6ac06e7f189307a4bf1300e2"

Also the cache prefix key I set here, is not being used at all. As you can see the redis keys doesn't include that prefix.

So wondering if anyone could explain what should be cached and what not.

Thanks!

Apr 19 '17 22:04 mustela

@mustela I think the reason caching isnt working on the User model is because its dependencies are not cached. If I'm right, you would need to add cache to serializers for memberships and organizations as well.

Apr 20 '17 05:04 harbirg

@harbirg they have, unless there is another way to specify that. All the serializers inherit from https://github.com/mustela/ams-cache/blob/master/app/serializers/abstract_serializer.rb#L2

Apr 20 '17 05:04 mustela

@bf4 the app I published has (I think) all the things you are mentioning. If you are familiar with docker, you should run the app easily. I can also generate more records or anything you need. I would really love to understand how the cache is working on AMS.

Thanks

Apr 20 '17 13:04 mustela

@mustela - I forked your repro here https://github.com/harbirg/ams-cache I switched over to Memcached as I did not have Redis installed but that should not matter. As per the logs, I see that User and Organizations model are cached and read back with a cache hit. Are you not seeing the same behaviour? Also, reading back cache was slightly slower than regenerating json response - likely because its only one user request - I put some benchmark results.

If you review the responses with both caching enabled or not, DB is accessed for both User and Organization models first. For Caching, It likely to check if the cache is dirty or not. If it is not, then readback memcached version. For non-caching case, it starts to regenerate the JSON response after DB access.

Apr 23 '17 17:04 harbirg

Thanks @harbirg, your tests are correct, that's what I'm seeing and as you can see, using caching is way much slower than not using it. I'm trying to put more tests/benchmarks in place to help here.

Apr 25 '17 07:04 mustela