APIcast icon indicating copy to clipboard operation
APIcast copied to clipboard

Improve path-based routing

Open mayorova opened this issue 7 years ago • 9 comments

This PR introduced routing the requests to different services based on the path, which allows having the same Public Base URL in multiple services.

This works, but the way it works is not very efficient. In the worst case it will iterate over all services and all mapping rules, and do regex matching for each one of them. After getting the service id, the results of regex matching are discarded, and it will be iterated again for all the rules of the service.

I think this can be improved.

As the 1st step it's probably worth doing a benchmark to understand better how serious the problem is.

Then, we can think about how this can be improved. I am thinking of two ways:

  1. Use the find_service_cascade function as it is now, but store the results of usage and matched_patterns to reuse them later and avoiding regex matching again.

  2. On configuration load build some kind of index based on the mapping rules patterns (across all services), which will be used for service id lookup on incoming requests.

mayorova avatar Dec 05 '16 12:12 mayorova

It is possible we could use https://github.com/toritori0318/lua-resty-r3 or https://github.com/APItools/router.lua to build a routing tree and do a dispatch on that.

mikz avatar Jan 11 '17 08:01 mikz

Adding to 3.3 milestone as agreed with @andrewdavidmackenzie

davidor avatar May 22 '18 09:05 davidor

I think when improving this we should think about some better design for this.

We will need to do not only path based routing, but also header / session / user based. We should come up with a pluggable design that allows to register code to do this routing.

This is something that will be required for ostia project.

We should evaluate several designs and see which is more flexible.

One would be a "conditional" policy. This policy that would embed a policy chain within itself and would execute only when certain condition is met. That would allow for creating independent trees of policies.

But maybe independent trees are not what we want.

Maybe introducing a new concept of a "named policy chain" that is basically just a chain with a name (no host, no paths, ...) and some concept of a router/dispatcher that sends the request to the correct named chain when certain conditions are met.

mikz avatar Jun 01 '18 08:06 mikz

@mayorova I understand your concerns about performance.

I think that the efficient way of solving this is using a prefix tree like the library @mikz mentioned: https://github.com/toritori0318/lua-resty-r3 However, notice that it's labeled as experimental. Also, I'd need to check all the kind of regexes that we allow in the mapping rules to see how to build the prefix tree in our case.

Having said that, I wonder if the performance penalty is noticeable in typical use cases. My concern is that we might introduce a complex algorithm and nobody will benefit from it. That's why I went ahead and benchmarked the mapping rules matching.

For these tests, I'm going to emulate the case of an apicast user with 100 services and 100 mapping rules for each of them. I believe that most users will have way less than that.

Here's the code using the benchmark-ips lib by @mikz I needed to modify the initializer of MappingRule but that's not important:

require('resty.core')

local mapping_rules_matcher = require 'apicast.mapping_rules_matcher'
local MappingRule = require 'apicast.mapping_rule'

math.randomseed(os.time())

-- Define valid charset (numbers and letters).
local charset = {}
for c = 48, 57  do table.insert(charset, string.char(c)) end
for c = 65, 90  do table.insert(charset, string.char(c)) end
for c = 97, 122 do table.insert(charset, string.char(c)) end

local function randomString(length)
  local res = {}

  for i = 1, length do
    local random_char = charset[math.random(1, #charset)]
    table.insert(res, random_char)
  end

  return table.concat(res, '')
end

local patterns = {}
for i = 1, 100 do
  table.insert(patterns, '/' .. randomString(32))
end

local rules = {}
for i = 1, 100 do
  table.insert(rules, MappingRule.new('GET', patterns[i], {}, {}, 'hits', 1))
end

require('ips')(function(b)
  b.time = 5
  b.warmup = 2

  local function apply_mapping_rules()
    for i = 1, 100 do
      mapping_rules_matcher.matches('GET', '/12345', {}, rules)
    end
  end

  b:report('mapping rules', function() return apply_mapping_rules() end)
end)

When I run this, I get around 500 i/s on my laptop. That means that matching 100 services with 100 rules each would take 1/500 = 0.002 s = 2 ms, and that's when all of them need to be checked because there isn't a match.

If this is accurate, I wonder if this is worth optimizing. If nobody has reported a problem with this with a production workload, I'd rather lower the priority of this and leave it out of the 3.3 milestone. Do you think that we should test with higher numbers of services and rules?

Please let me know if you see any flaws in my reasoning.

davidor avatar Jul 25 '18 17:07 davidor

@davidor so is the conclusion that path based routing does not have performance issues and this is ready for QE?

MarkCheshire avatar Jul 27 '18 12:07 MarkCheshire

@MarkCheshire mine conclusion is that we have no numbers that this should meet. How many services, how many mapping rules, what patterns, ... Using our examples it looks fine. It is possible there are some pathological cases where this is not fine for someone, but we need examples of those cases and see what is the difference between path routing enabled/disabled.

mikz avatar Jul 31 '18 13:07 mikz

I would not have a better suggestion that David already covered in the performance test. Beyond that we need to see how much customers start to experience performance impact and then adjust the implementation correspondingly.

MarkCheshire avatar Jul 31 '18 13:07 MarkCheshire

Exactly. I think we have some information to indicate its "probably OK" and no information to indicate there is a problem.

So, I propose we close as-is, have QE test it in 2.2 (separate request Mark made to them), don't spend more time discussing it or benchmarking it or implement any changes - and then deal with any hypothetical problems if and when they appear in the future.

andrewdavidmackenzie avatar Jul 31 '18 14:07 andrewdavidmackenzie

OK. So let's leave this out for 3.3, but keep the issue open for future consideration. If someone experiences a performance problem related with this, please let us know.

davidor avatar Jul 31 '18 14:07 davidor