fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

in_docker: stricter config json name matching

Open fopina opened this issue 2 years ago • 9 comments

When a container is part of a docker swarm service, service name was reported (instead of container name) and first char was lost

Currently "name"=>"kron_agents":

/opt/td-agent-bit/bin/td-agent-bit -i docker -o stdout -q
[0] docker.0: [1650624963.504885670, {"id"=>"afef4d705052", "name"=>"kron_agents", "cpu_used"=>1679490969020, "mem_used"=>104538112, "mem_limit"=>104857600}]

Expected "name"=>"dkron2_agents.o2y1zv88kwvw97w14brnqneen.attv68rjfwcyg227xgpcogl1w":

/opt/td-agent-bit/bin/td-agent-bit -e /opt/td-agent-bit/flb-in_docker2.so -i docker2 -o stdout -q
[0] docker2.0: [1650625107.506103721, {"id"=>"afef4d705052", "name"=>"dkron2_agents.o2y1zv88kwvw97w14brnqneen.attv68rjfwcyg227xgpcogl1w", "cpu_used"=>1680827091746, "mem_used"=>104534016, "mem_limit"=>104857600}]

The issue is that this plugin extracts container name by looking at its config json, finding the first "Name" entry and skipping 9 characters, which also skips the leading / in internal container names.

The problem is that when container belongs to a swarm service there's a JSON entry {Service: {Name: ...} that is listed before container name and the extract code does not expect that. Result being service name is extracted instead of container name and, as service names are not prefixed with /, the first character is lost.

This change updates the extract code to look for "Name":"/ specifically, assuming proper JSON parsing was not implemented from the start for performance reasons.

It has also been done in a standalone fork of the plugin so I could test/use straight away - https://github.com/fopina/fluent-bit-in-docker2


Enter [N/A] in the box, if an item is not applicable to your change.

Testing Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • [x] Debug log output from testing the change
  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Attached local packaging test output showing all targets (including any new ones) build.

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

fopina avatar Apr 22 '22 11:04 fopina

debug log

/opt/td-agent-bit/bin/td-agent-bit -i docker -o stdout -m '*' -o exit -m '*' -v
Fluent Bit v1.9.2
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/04/22 12:06:00] [ info] Configuration:
[2022/04/22 12:06:00] [ info]  flush time     | 1.000000 seconds
[2022/04/22 12:06:00] [ info]  grace          | 5 seconds
[2022/04/22 12:06:00] [ info]  daemon         | 0
[2022/04/22 12:06:00] [ info] ___________
[2022/04/22 12:06:00] [ info]  inputs:
[2022/04/22 12:06:00] [ info]      docker
[2022/04/22 12:06:00] [ info] ___________
[2022/04/22 12:06:00] [ info]  filters:
[2022/04/22 12:06:00] [ info] ___________
[2022/04/22 12:06:00] [ info]  outputs:
[2022/04/22 12:06:00] [ info]      stdout.0
[2022/04/22 12:06:00] [ info]      exit.1
[2022/04/22 12:06:00] [ info] ___________
[2022/04/22 12:06:00] [ info]  collectors:
[2022/04/22 12:06:00] [ info] [fluent bit] version=1.9.2, commit=, pid=27218
[2022/04/22 12:06:00] [debug] [engine] coroutine stack size: 196608 bytes (192.0K)
[2022/04/22 12:06:00] [debug] [storage] [cio stream] new stream registered: docker.0
[2022/04/22 12:06:00] [ info] [storage] version=1.1.6, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/04/22 12:06:00] [ info] [cmetrics] version=0.3.0
[2022/04/22 12:06:00] [debug] [docker:docker.0] created event channels: read=21 write=22
[2022/04/22 12:06:00] [debug] [stdout:stdout.0] created event channels: read=23 write=24
[2022/04/22 12:06:00] [debug] [exit:exit.1] created event channels: read=30 write=31
[2022/04/22 12:06:00] [debug] [router] match rule docker.0:stdout.0
[2022/04/22 12:06:00] [debug] [router] match rule docker.0:exit.1
[2022/04/22 12:06:00] [ info] [sp] stream processor started
[2022/04/22 12:06:00] [ info] [output:stdout:stdout.0] worker #0 started
[2022/04/22 12:06:00] [debug] [input chunk] update output instances with new chunk size diff=147
[2022/04/22 12:06:00] [debug] [input chunk] update output instances with new chunk size diff=129
[2022/04/22 12:06:00] [debug] [input chunk] update output instances with new chunk size diff=153
[2022/04/22 12:06:01] [debug] [task] created task=0x7fb6044700 id=0 OK
[2022/04/22 12:06:01] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] docker.0: [1650625560.505582130, {"id"=>"afef4d705052", "name"=>"dkron2_agents.o2y1zv88kwvw97w14brnqneen.attv68rjfwcyg227xgpcogl1w", "cpu_used"=>1684874715504, "mem_used"=>104529920, "mem_limit"=>104857600}]
[1] docker.0: [1650625560.505766441, {"id"=>"f145bdacdd2d", "name"=>"surf_resync-rootbox.1.i1azo708bm4nyhn3zbjzbbakn", "cpu_used"=>4356669333, "mem_used"=>46792704, "mem_limit"=>209715200}]
[2] docker.0: [1650625560.505814773, {"id"=>"d6ff50d4b960", "name"=>"portainer_agent.o2y1zv88kwvw97w14brnqneen.xxanqkshung8ocz0q4ui1b1ek", "cpu_used"=>616098336485, "mem_used"=>33619968, "mem_limit"=>9223372036854771712}]
[2022/04/22 12:06:01] [debug] [out flush] cb_destroy coro_id=0
[2022/04/22 12:06:01] [debug] [input chunk] update output instances with new chunk size diff=147
[2022/04/22 12:06:01] [debug] [input chunk] update output instances with new chunk size diff=129
[2022/04/22 12:06:01] [debug] [input chunk] update output instances with new chunk size diff=153
[2022/04/22 12:06:01] [ info] [input] pausing docker.0
[2022/04/22 12:06:01] [debug] [task] created task=0x7fb6044770 id=1 OK
[2022/04/22 12:06:01] [debug] [output:stdout:stdout.0] task_id=1 assigned to thread #0
[2022/04/22 12:06:01] [ warn] [engine] service will shutdown in max 5 seconds
[0] docker.0: [1650625561.505994702, {"id"=>"afef4d705052", "name"=>"dkron2_agents.o2y1zv88kwvw97w14brnqneen.attv68rjfwcyg227xgpcogl1w", "cpu_used"=>1684879407984, "mem_used"=>104529920, "mem_limit"=>104857600}]
[2022/04/22 12:06:01] [debug] [out flush] cb_destroy coro_id=0
[2022/04/22 12:06:01] [debug] [task] destroy task=0x7fb6044700 (task_id=0)
[1] docker.0: [1650625561.506206642, {"id"=>"f145bdacdd2d", "name"=>"surf_resync-rootbox.1.i1azo708bm4nyhn3zbjzbbakn", "cpu_used"=>4356669333, "mem_used"=>46792704, "mem_limit"=>209715200}]
[2022/04/22 12:06:01] [debug] [out flush] cb_destroy coro_id=1
[2] docker.0: [1650625561.506268641, {"id"=>"d6ff50d4b960", "name"=>"portainer_agent.o2y1zv88kwvw97w14brnqneen.xxanqkshung8ocz0q4ui1b1ek", "cpu_used"=>616100355538, "mem_used"=>33619968, "mem_limit"=>9223372036854771712}]
[2022/04/22 12:06:01] [debug] [out flush] cb_destroy coro_id=1
[2022/04/22 12:06:01] [debug] [task] destroy task=0x7fb6044770 (task_id=1)
[2022/04/22 12:06:02] [ info] [engine] service has stopped (0 pending tasks)
[2022/04/22 12:06:02] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2022/04/22 12:06:02] [ info] [output:stdout:stdout.0] thread worker #0 stopped

fopina avatar Apr 22 '22 11:04 fopina

@edsiper @patrick-stephens seems CODEOWNERS is broken on this one and did not request reviewers...

fopina avatar Apr 22 '22 11:04 fopina

It should have been caught by the default one so not sure why that didn't work, although maybe a Github issue.

patrick-stephens avatar Apr 22 '22 14:04 patrick-stephens

Ah, the configuration seems to specify a user that does not map for me at least:

/plugins/in_docker       @ashutoshdhundhara

So that is likely why, I'll raise a separate issue on updating that.

patrick-stephens avatar Apr 22 '22 14:04 patrick-stephens

I'm not familiar with docker swarm.

Note: config.v2.json seems to represent below struct Container. https://github.com/moby/moby/blob/v20.10.14/container/container.go#L61

It contains Name which we want to look up and NetworkSettings. https://github.com/moby/moby/blob/v20.10.14/container/container.go#L77 https://github.com/moby/moby/blob/v20.10.14/container/container.go#L75

NetworkSettings has a member Service. https://github.com/moby/moby/blob/v20.10.14/daemon/network/settings.go#L22

Service has a Name. https://github.com/moby/moby/blob/v20.10.14/daemon/cluster/provider/network.go#L33

Summary: config.v2.json contains below Name members.

{
    "NetworkSettings": {
         "Service": {
              "Name":"hoge"
         },
    },
    "Name":"/hoge"
}

nokute78 avatar May 04 '22 01:05 nokute78

thanks for the comment @nokute78 Was there a question or remark towards the PR there? As that is indeed the problem: a non-container “Name:” shows up before the container Name key and the current code extracts it (wrongly).

as container names always start with /, adding that to the extraction fixes it (without doing full json parsing)

fopina avatar May 04 '22 01:05 fopina

@fopina My note is to understand container.v2.json which is the file to get container information.

I didn't confirm that first character of container name should be /. Could you point a document or source ? If container name should start with /, I agree the patch.

In my understanding, this patch depends on the struct Container and name of container. If a member of Container or name rule will be changed, we may need more patches.

nokute78 avatar May 04 '22 01:05 nokute78

I haven’t looked it up documented, I’ll try to find it in the code itself.

But even without that confirmation, I believe it’s safe to assume that, from two different perspectives:

  1. My own experience: every docker inspect I’ve done (CLI and API) always return / in the container name
  2. Current plugin: if you see the number of caracters skipped during name extraction, you see there’s one more - so / is not included. Meaning current code is also expecting /, it’s not really changing that behavior

fopina avatar May 04 '22 01:05 fopina

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Aug 02 '22 02:08 github-actions[bot]

I was just now going to upgrade my fork from this plugin to fluent-bit v2 and realized it was only to address this issue that is still open.

Is it still up for consideration / Is it worth for me to resolve conflicts and update the PR?

fopina avatar May 08 '23 20:05 fopina

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Aug 08 '23 01:08 github-actions[bot]

let's fix the conflicts to revisit pls

edsiper avatar Oct 04 '23 20:10 edsiper

let's fix the conflicts to revisit pls

Has anything changed regarding @nokute78 concerns?

They're not related to code itself, so I don't see why resolving the conflicts would help before clearing that up

fopina avatar Oct 05 '23 00:10 fopina

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Jan 03 '24 01:01 github-actions[bot]