kong icon indicating copy to clipboard operation
kong copied to clipboard

fix(core) don't assign ctx objects by reference in logs objects

Open fdeschamps opened this issue 3 years ago • 8 comments

Some ctx objects are assigned by reference in serialize function. Therefore any modification of the log in log plugins would affect the entire ctx and impact the behavior of other plugins. Using deep_copy function ensures that this does not happen.

Summary

If a plugins that use the pdk serialize function, then modify the object returned (like route or service), it would also modify the ctx and affect the behavior of other plugins. By using deep copy, we don't passe references to objects in the context.

I ran into this bug because I'm using tcp-log and wanted to remove some fields from the log. I used the custom_fields_by_lua feature like this :

{
  "config": {
    "custom_fields_by_lua": {
       "route.id": "return nil",
       "route.ws_id": "return nil",
       "route.service.id": "return nil",
       "service.id": "return nil",
       "service.ws_id": "return nil",
       "route.created_at": "return nil",
       "service.created_at": "return nil",
       "route.updated_at": "return nil",
       "service.updated_at": "return nil"
    },
    "timeout": 10000,
    "keepalive": 60000,
    "port": 514,
    "host": "log_collector",
    "tls": false,
    "tls_sni": null
  },
  "service": null,
  "route": null,
  "protocols": [
    "grpc",
    "grpcs",
    "http",
    "https"
  ],
  "enabled": true,
  "tags": null,
  "name": "tcp-log",
  "consumer": null
}'

While the log was edited like I wanted, I found out that other plugins (like rate-limiting) were not working anymore. By looking into the code I found out that since the route.id was deleted from the context, the rate-limiting plugin could not load the appropriate configuration.

Full changelog

  • Use deep_copy() to assign route, service, consumer and balancer_data objects to the log object.

fdeschamps avatar Apr 21 '22 08:04 fdeschamps

From the tests, it seems that utils.deep_copy is changing the type of empty json arrays, I'll look into that.

fdeschamps avatar May 02 '22 08:05 fdeschamps

The deep_copy function was not copying the metadata ARRAY_MT from cjson, which is weird because deep_copy is copying metatables by default. I found out that changing deep_copy to copy metatables by reference is fixing this problem, but I didn't want to patch deep_copy, this could has side effects. So I chose to set this metadata a second time after deep_copying the tries table. I had to use a temporary variable to verify if the tries variable is nil or a table before setting the metadata.

fdeschamps avatar May 02 '22 13:05 fdeschamps

Thinking out loud, using deep copy to prevent user to accidently modify it seems bit overkill. It's still possible to modify it directly using for example ngx.ctx.service. I would instead suggest improve on the docs to state that the returned the object should be treated readonly, or even better, attach a metatable to make it readonly (https://www.lua.org/pil/13.4.5.html).

fffonion avatar May 25 '22 10:05 fffonion

To hit the bug (unexpected behavior), I was only using a feature of the tcp-log plugin that permit to modify the log object. I'm not a plugin coder that can access the ngx.ctx variable directly. I'm only using the kong admin API to remove fields from the log via the custom_fields_by_lua feature of the tcp-log plugin.

We could change the patch to modify the tcp-log plugin instead and make a deep_copy of the log object there if the user requests a change on it (using the custom_fields_by_lua feature). But I think that patching the core function in the pdk addresses any future issues if others plugins want to alter the log object.

OR, if you don't want to use deep_copy, a solution is to explicitly assign all the services/routes subfields in the code. But the counterpart of this is having to update this part of code in the future when the routes/services objects will change...

In my opinion, I think that the routes/services objects are so small that using deep_copy on them doesn't have any impact on performance. We made some benchs on our end and saw no differences.

With that said, I can adapt this patch as you will suggest.

fdeschamps avatar Jun 01 '22 16:06 fdeschamps

I agree this is a bug. Small clarification:

would affect the entire ctx and impact the behavior of other plugins.

Modifying ctx so that other plugins get affected is one of the things plugins are supposed to do. For example, the authentication plugins will set ctx.authenticated_consumer and ctx.authenticated_credential so other plugins can use these values (rate limiting can limit by consumer, and so on).

The bug is that we're doing modifications on ctx when executing kong.log.serialize. This PDK function is supposed to return a new table, not make modifications to ctx. The reason why this happens is that some of the values set on the new table are copies (like client_ip), but others (like service) are references to tables which are inside ctx. Changing service for producing the serialized ctx should not change ctx. So I think this is a bug, and the fix does the correct thing.

Two things I'd like to see:

  • An entry on the CHANGELOG.md (Unreleased/Fixes/PDK)
  • At least one test that verifies that the fix is correct in https://github.com/Kong/kong/blob/master/t/01-pdk/02-log/05-set_serialize_value.t

kikito avatar Jun 02 '22 12:06 kikito

just a note, instead of deepcopy, LuaJITs table.clone may be enough.

bungle avatar Jun 02 '22 23:06 bungle

table.clone is a shallow copy and will fail for nested tables:

 /p/tmp  cat a.lua                                                                                                                          149ms ❮ 14:08:36
local table_clone = require "table.clone"

local a ={
    x = 1,
    y = {
        z = 3,
    },
}

local b = table.clone(a)

b.x = 100
b.y.z = 300

print(require("inspect")(a))
print(require("inspect")(b))

Outputs:

{
  x = 1,
  y = {
    z = 300
  }
}
{
  x = 100,
  y = {
    z = 300
  }
}

the "read-only" hack i mentioned above also doesn't take care of nested table as well.

I agree this is a bug, but I don't think always clone or copy the table is a good idea. Instead, we should make it readonly, if user want to modify it, user can clone or copy it on their own.

fffonion avatar Jun 06 '22 06:06 fffonion

I agree this is a bug, but I don't think always clone or copy the table is a good idea. Instead, we should make it readonly, if user want to modify it, user can clone or copy it on their own.

Agree; I think the added overhead of copying will be detrimental. What about instead changing the behavior of kong.log.set_serialize_value() to adopt a copy-on-write-ish behavior? When applying the values, we detect if the target is a table and perform a deep copy before making modifications. This action could be probably be memoized so that it's not performed more than once:

-- 1) no edits => no copy
local log_1 = kong.log.serialize()
assert(log_1.service == ngx.ctx.service) -- same object
assert(log_1.service.host == "my-hostname")

-- 2) first edit => copies the service table
kong.log.set_serialize_value("service.host", "my-new-hostname")
local log_2 = kong.log.serialize()
assert(log_2.service ~= ngx.ctx.service) -- different objects
assert(log_2.service == "my-new-hostname")

-- 3) second edit => no copy
kong.log.set_serialize_value("service.host", "my-really-new-hostname")
local log_3 = kong.log.serialize()
assert(log_3.service ~= ngx.ctx.service) -- different objects
assert(log_3.service == "my-really-new-hostname")
-- log_2.service was re-used because it's already a copy of the original
assert(log_3.service == log_2.service)
assert(log_2.service == "my-really-new-hostname")

Alternatively, instead of copying every table on write op, we could consult a lookup table of known "sensitive" tables instead of blindly copying everything (because we really only care about core entities, like service/route/consumer).

flrgh avatar Jun 06 '22 19:06 flrgh

@fdeschamps Do you have time to address the feedback provided above?

hbagdi avatar Oct 26 '22 18:10 hbagdi

@fdeschamps Do you have time to address the feedback provided above?

Not very much time and I don't have enough expertise either. Also we have the first patch I proposed with deep copy in production since April, and everything works well, so I prefer focusing on other tasks in my work.

fdeschamps avatar Oct 26 '22 18:10 fdeschamps

Closing because of lack of activity. We would like to accept this contribution if anyone wishes to pick this back up.

hbagdi avatar Mar 21 '23 21:03 hbagdi