cloudbeat icon indicating copy to clipboard operation
cloudbeat copied to clipboard

[CNVM] Trivy local cache file size increases indefinitely

Open moukoublen opened this issue 10 months ago • 3 comments

Describe the bug Trivy uses a local file (bbolt db) as a cache in the /tmp directory (/tmp/trivy/fanal/fanal.db) that always increases in size with each cycle.

This results in the tmpfs file system holding the /tmp folder getting filled up (1), and Cloudbeat can no longer download the new trivy db (which it does on each cycle). This leads to Cloudbeat's crash loop and not providing cnvm findings. It could also have implications for other applications hosted in the same instance that could use /tmp for any crucial operation.

(2) /tmp (tmpfs) is a ram disk (placed in ram) with a maximum size, usually half of the host's total ram.

(Example screenshots of fanal.db size before and after some runs) Screenshot 2024-04-16 at 11 01 33 AM Screenshot 2024-04-16 at 11 38 12 AM Screenshot 2024-04-17 at 9 03 19 AM

[ec2-user ~]$ sudo tree -h /tmp
/tmp
└── [   60]  trivy
    └── [   60]  fanal
        └── [ 1.9G]  fanal.db

12 directories, 1 file
[ec2-user ~]$ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       587Mi       8.4Gi       1.9Gi       6.3Gi        12Gi
Swap:             0B          0B          0B

Preconditions Any cnvm deployment.

To Reproduce

  1. Create a cnvm deployment with agent version >= 8.12 (pending checking older releases as well).
  2. Wait for many runs to pass (depending on the cloud assets, and host's ram size).

Expected behavior Cloudbeat will be able to work indefinitely and produce events on each cycle.

Workaround till the fix Restarting the host machine will delete everything from /tmp, and thus the fanal.db so Cloudbeat can continue to work and produce findings.

moukoublen avatar Apr 17 '24 07:04 moukoublen

@orestisfl didn't you also create a ticket for this?

romulets avatar Apr 22 '24 07:04 romulets

@orestisfl didn't you also create a ticket for this?

https://github.com/elastic/security-team/issues/8217

orestisfl avatar Apr 22 '24 10:04 orestisfl

I took a look into the ticket https://github.com/elastic/security-team/issues/8217

It seems the root cause is the same, but we just got a different error during the db update flow.

Trivy uses this to specify cache directory: https://github.com/aquasecurity/trivy/blob/d4da83c633a46ad4a61844d8d5502d87b99465a0/pkg/utils/fsutils/fs.go#L23-L29

func defaultCacheDir() string {
	tmpDir, err := os.UserCacheDir()
	if err != nil {
		tmpDir = os.TempDir()
	}
	return filepath.Join(tmpDir, "trivy")
}

Which in most cases return a cache directory into filesystem (e.g. /root/.cache/trivy if is run as root)

Unless os.UserCacheDir() returns an error in which case it uses /tmp.

The function os.UserCacheDir() returns error (in Linux) when both XDG_CACHE_HOME and HOME env var are not defined: https://cs.opensource.google/go/go/+/master:src/os/file.go;l=501-510?q=UserCacheDir&ss=go%2Fgo

default: // Unix
    dir = Getenv("XDG_CACHE_HOME")
    if dir == "" {
        dir = Getenv("HOME")
        if dir == "" {
            return "", errors.New("neither $XDG_CACHE_HOME nor $HOME are defined")
        }
        dir += "/.cache"
    }

Whic in our case there are not.

$ sudo cat /proc/$(pidof cloudbeat)/environ | tr '\0' '\n'
PWD=/opt/Elastic/Agent
SYSTEMD_EXEC_PID=2095
LANG=C.UTF-8
INVOCATION_ID=...
SHLVL=0
JOURNAL_STREAM=...
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
AGENT_COMPONENT_ID=cloudbeat/vuln_mgmt_aws-default
AGENT_COMPONENT_TYPE=cloudbeat/vuln_mgmt_aws

Cloudbeat that runs under elastic-agent does not inherit all environment variables.

So that explains both error logs we had.

Since the https://github.com/elastic/security-team/issues/8217 definition of done was to find the root cause of the issue (apart from solving it) and since the root cause was found during the investigation that led to this ticket, if there is no objection, I will close it as done, referring to this one.

moukoublen avatar Apr 22 '24 12:04 moukoublen

Verified by checking the fanal.db size in a period of a week:

Measures:
{
  "date": "Wed Jul 17 13:49:38 UTC 2024",
  "size": "113M"
}
{
  "date": "Thu Jul 18 08:49:00 UTC 2024",
  "size": "252M"
}
{
  "date": "Thu Jul 18 11:08:38 UTC 2024",
  "size": "86M"
}
{
  "date": "Thu Jul 18 13:08:39 UTC 2024",
  "size": "111M"
}
{
  "date": "Thu Jul 18 15:08:40 UTC 2024",
  "size": "158M"
}
{
  "date": "Thu Jul 18 21:36:46 UTC 2024",
  "size": "222M"
}
{
  "date": "Fri Jul 19 06:52:34 UTC 2024",
  "size": "222M"
}
{
  "date": "Fri Jul 19 19:06:46 UTC 2024",
  "size": "202M"
}
{
  "date": "Sat Jul 20 06:33:31 UTC 2024",
  "size": "202M"
}
{
  "date": "Sun Jul 21 08:32:26 UTC 2024",
  "size": "194M"
}
{
  "date": "Sun Jul 21 10:32:29 UTC 2024",
  "size": "59M"
}
{
  "date": "Sun Jul 21 12:32:31 UTC 2024",
  "size": "84M"
}
{
  "date": "Sun Jul 21 14:32:32 UTC 2024",
  "size": "141M"
}
{
  "date": "Sun Jul 21 16:32:33 UTC 2024",
  "size": "189M"
}
{
  "date": "Sun Jul 21 18:32:35 UTC 2024",
  "size": "189M"
}
{
  "date": "Sun Jul 21 20:32:36 UTC 2024",
  "size": "189M"
}
{
  "date": "Sun Jul 21 22:32:37 UTC 2024",
  "size": "189M"
}
{
  "date": "Mon Jul 22 00:32:38 UTC 2024",
  "size": "189M"
}
{
  "date": "Mon Jul 22 02:32:39 UTC 2024",
  "size": "189M"
}
{
  "date": "Mon Jul 22 04:32:41 UTC 2024",
  "size": "189M"
}
{
  "date": "Mon Jul 22 06:32:42 UTC 2024",
  "size": "189M"
}
{
  "date": "Mon Jul 22 08:32:44 UTC 2024",
  "size": "189M"
}
{
  "date": "Mon Jul 22 10:32:51 UTC 2024",
  "size": "59M"
}
{
  "date": "Mon Jul 22 12:32:57 UTC 2024",
  "size": "112M"
}
{
  "date": "Mon Jul 22 14:33:03 UTC 2024",
  "size": "166M"
}
{
  "date": "Mon Jul 22 16:33:09 UTC 2024",
  "size": "184M"
}

amirbenun avatar Jul 22 '24 17:07 amirbenun