gitea icon indicating copy to clipboard operation
gitea copied to clipboard

Add Audit Trail/Logging

Open KN4CK3R opened this issue 2 years ago • 81 comments

Fixes #8

This PR adds logging of audit events to provide documentary evidence of the sequence of important activities.

Notes for reviewers:

  • Contains some small refactorings to get the needed objects
  • I extracted the rotating file logger from modules/log to reuse it for the audit logging

Admin panel: grafik

Repo settings: grafik

KN4CK3R avatar Apr 21 '23 13:04 KN4CK3R

PR is looking good overall! This is an impressive amount of work. Lots of updates, I guess that's required for adding logging to nearly every action.

philip-peterson avatar Apr 25 '23 17:04 philip-peterson

The file output contains JSON which is expected to be read by another application and not intended to be viewed by a user. But if you are able to filter an usual log file, you can do it with the audit log too.

KN4CK3R avatar May 29 '23 06:05 KN4CK3R

The file output contains JSON which is expected to be read by another application and not intended to be viewed by a user. But if you are able to filter an usual log file, you can do it with the audit log too.

Is there any standard file format or tool to read or filter that?

lunny avatar Jun 13 '23 06:06 lunny

There is no standard format I know about.

Here are some examples how it looks from other software: https://www.mongodb.com/docs/manual/reference/audit-message/ https://developers.mural.co/enterprise/docs/audit-logs-examples (similar to what I have implemented) https://cloud.yandex.com/en/docs/audit-trails/concepts/format (similar in parts) https://learn.microsoft.com/en-us/microsoft-365/compliance/audit-log-detailed-properties?view=o365-worldwide https://docs.mattermost.com/comply/embedded-json-audit-log-schema.html (similar in parts)

KN4CK3R avatar Jun 26 '23 19:06 KN4CK3R

Are there any tests covering the audit log output? Not, I only skimmed the changeset, but could not see such tests.

silverwind avatar Aug 07 '23 22:08 silverwind

I think we should not store the audit log to files because it's difficult to filter via different conditions. i.e. date range, actors, ips, actions and etc. We have discussed a lot in #8 . Maybe we can discuss further before continuing this PR.

I tend to agree that it is probably better a log in the in database and visible in the UI at the relevant places (per-unit, e.g. repo, org, etc visible to owners and a global log in admin that combines unit logs).

silverwind avatar Aug 30 '23 22:08 silverwind

One more thing, many people have started trying to use Gitea in a cluster, local file makes it different to work in a cluster.

wxiaoguang avatar Aug 31 '23 01:08 wxiaoguang

Thank you for this PR!

IMO this audit log feature as currently implement is a great addition to gitea and would enable many new orgs to use it that have strict security requirements. Logging to a file (or to a plain socket/stdout) is indeed the best way as well established, good tooling to pick up these logs already exist.

Not storing the audit logs as they happen (to a file) would infact be deemed more of a security risk and be less expected, e.g if that is to happen, it should happen in addition to emitting the log line as the events happen (they are then shipped to a log/audit handling system, immediately).

Per cluster member logs are not a problem. If you operate a cluster, you should collect logs centrally anyhow.

beddari avatar Aug 31 '23 13:08 beddari

@KN4CK3R can you show a sample how the log format looks currently?

I guess the file could serve as a starting point and yes, in a properly set up distributed systems, logs will be centrally collected in things like ELK, so it's not such a big issue really.

Still I would like to see UI eventually, and that would require these logs to be stored in database as well.

silverwind avatar Aug 31 '23 14:08 silverwind

{"action":"repository:mirror:push:add","doer":{"type":"user","primary_key":2,"friendly_name":"Doer"},"scope":{"type":"repository","primary_key":3,"friendly_name":"TestUser/TestRepo"},"target":{"type":"push_mirror","primary_key":4,"friendly_name":""},"message":"Added push mirror for repository TestUser/TestRepo.","time":"0001-01-01T00:00:00Z"}

or beautified

{
  "action": "repository:mirror:push:add",
  "doer": {
    "type": "user",
    "primary_key": 2,
    "friendly_name": "Doer"
  },
  "scope": {
    "type": "repository",
    "primary_key": 3,
    "friendly_name": "TestUser/TestRepo"
  },
  "target": {
    "type": "push_mirror",
    "primary_key": 4,
    "friendly_name": ""
  },
  "message": "Added push mirror for repository TestUser/TestRepo.",
  "time": "0001-01-01T00:00:00Z"
}

KN4CK3R avatar Aug 31 '23 14:08 KN4CK3R

Nice, I like it. I guess it shouldn't be too hard to add a database table for it later, nested fields or the whole entry could be stored in a JSON SQL type potentially (once all supported databases support it).

silverwind avatar Aug 31 '23 15:08 silverwind

While this is nice it would be better to log it to db as already mentioned to be able to show audit logs to user/org/repo.

While it is ok for an organization to have an audit log in central place, it is also important for users on public instances to see such info about their repo/org or review their own actions to detect malicious activity with their account and in this case this won't be possible

lafriks avatar Aug 31 '23 23:08 lafriks

While this is nice it would be better to log it to db as already mentioned to be able to show audit logs to user/org/repo.

While it is ok for an organization to have an audit log in central place, it is also important for users on public instances to see such info about their repo/org or review their own actions to detect malicious activity with their account and in this case this won't be possible

That abstract can be added later and shouldn’t block this PR

(and can easily be done with this PR by external services, like datadog or loki)

kdumontnu avatar Sep 01 '23 01:09 kdumontnu

That is for some bigger instances or an expert users. For most Gitea instances, if they want to filter logs, they need to install many other tools if the logs stored on file but not databases.

When implementing the filter, how file mode implement a filter options?

So for small instance or for big instance? For beginners or experts?

lunny avatar Sep 01 '23 01:09 lunny

That is for some bigger instances or an expert users. For most Gitea instances, if they want to filter logs, they need to install many other tools if the logs stored on file but not databases.

When implementing the filter, how file mode implement a filter options?

So for small instance or for big instance? For beginners or experts?

Again, this PR lays the groundwork and provides a solution that is needed by many current users if this tool and is entirely consistent with what you’re asking for.

After this is merged (or in parallel), add a new setting to the config and an abstraction layer for writing to db instead of to disk.

To answer your second question, I’m sure there are some file streaming utilities that can monitor log files.

For those that require this log for compliance purposes, log file works well.

kdumontnu avatar Sep 01 '23 01:09 kdumontnu

That is for some bigger instances or an expert users. For most Gitea instances, if they want to filter logs, they need to install many other tools if the logs stored on file but not databases.

Once database logging is added that should be default, along with displaying those logs in the UI. Still, the file output is useful regardless because it is good practice to ship audit logs to centralized logging, so yes I see it as groundwork. How those logs are shipped is definitely out of scope for gitea, there are numerous tools available to ship logs to centralized logging.

silverwind avatar Sep 01 '23 23:09 silverwind

Currently waiting for #26873 to simplify secrets logging.

KN4CK3R avatar Sep 05 '23 15:09 KN4CK3R

@lunny @delvh I still think this needs a decision, as I'm sure it's difficult to maintain the branch. I'll put down some thoughts here to ensure I understand where things stand.

Problem

I see two use-cases:

  • Monitoring critical activity in gitea
    • Helpful for small instances, logs monitored manually in application
  • Monitoring critical activity external to gitea
    • Helpful for larger deployments or aggregating multiple instances
    • Required for some compliance purposes
    • More powerful alerts, filtering available in 3rd-party tools (eg. loki, data-dog)

This PR starts by solving the second use-case, as it is simpler and also a strong requirement for enterprise deployments (and we are sponsoring this development).

Solution

Both of these options share 80% common code, and only the interface should change (just like activity logs imo). The simplest and most effective solution we found was to write to disk (which can be grepped, backed-up, etc.), but I don't hold a strong opinion if this is NOT something we want, and there is a better solution to get auditing externally.

We envision a config that can be used to add different interfaces over time.

Next steps

Are there any specific issues with the explanation above?

kdumontnu avatar Sep 18 '23 16:09 kdumontnu

I guess you are referring to the decision of storing the audit log in the db vs. on disk?

As you said yourself, there are different use cases for it. My thought on this is that the DB offers several benefits out of the box that you have to work around otherwise:

  1. You know exactly where the audit is located and don't need to search your virtual filesystem for where the file could be
  2. Storage acquired not through Gitea itself does not need to be managed by Gitea
  3. It is synchronized among replicas, so it comes with HA support by default. With the file based approach, you need to sync the different audit logs
  4. It allows "grepping" for any imaginable sub-audit by default (SELECT * FROM audit WHERE <x>). For the file based audit, you need specialized tooling
  5. It is easier to dump the SQL result into a JSON file than it is the other way around.

So, overall I think the DB would be better than a normal file. However, I'm not too experienced when dealing with large clusters, perhaps DevOps experts know some ways to reap the benefits I've mentioned for DB storage for files as well.

delvh avatar Sep 18 '23 17:09 delvh

Additional benefit for database is to allow to show security/audit log for individual users/organizations/repos in UI like there is in many apps (in Github - https://github.com/settings/security-log) - if these events are written in files it's not realistic to implement such functionality

lafriks avatar Sep 19 '23 11:09 lafriks

So, overall I think the DB would be better than a normal file. However, I'm not too experienced when dealing with large clusters, perhaps DevOps experts know some ways to reap the benefits I've mentioned for DB storage for files as well.

Additional benefit for database is to allow to show security/audit log for individual users/organizations/repos in UI like there is in many apps (in Github - https://github.com/settings/security-log) - if these events are written in files it's not realistic to implement such functionality

@lafriks @delvh yes, I think we agree. However, the question we need to answer is “do we want both (since there are multiple uses)?” If “yes” then lets merge this PR so that we can add a database option next. If no, then I think someone needs to step up and add a new db interface.

kdumontnu avatar Sep 19 '23 11:09 kdumontnu

It is easier to dump the SQL result into a JSON file than it is the other way around.

I still stand by that. So, it makes the task/future maintainability easier if we implement the DB interface first.

delvh avatar Sep 19 '23 14:09 delvh

It is easier to dump the SQL result into a JSON file than it is the other way around.

I still stand by that. So, it makes the task/future maintainability easier if we implement the DB interface first.

Isn’t it even easier to write to both simultaneously if you need both? I had just assumed that if we wanted to add other logging destinations, it would be done in parallel that way one interface doesn’t depend on another.

kdumontnu avatar Sep 19 '23 15:09 kdumontnu

Yes, also true. In the end, we probably need a mechanism similar to the notifier interface specifically for audit logs.

delvh avatar Sep 19 '23 15:09 delvh

Do we accept filling the DB here as well without any UI? I would imagine if UI is in scope for this PR, it will get too complex.

silverwind avatar Sep 19 '23 15:09 silverwind

What UI? That is certainly out of scope for this PR.

delvh avatar Sep 19 '23 15:09 delvh

The per-unit UI that shows the audit log. E.g. in Repo there would be a "Audit Log" in the Activity menu to show all changes done to the repo. Good to hear it's out of scope.

silverwind avatar Sep 19 '23 16:09 silverwind

Implementing the database first and processing an entry afterwards would be easier for me to implement. But is that compliant? Isn't the purpose of an audit log to record an event and move it away from a possible attacker? I don't know if there are any regulations for that.

KN4CK3R avatar Sep 19 '23 16:09 KN4CK3R

If there is a hypothetical function writeAuditLog(entry) that is called whenever a entry is generated, first write it to file, then write it to db, maybe even concurrently. Never read from db for the file generation imho.

silverwind avatar Sep 19 '23 16:09 silverwind

That's already implemented. Your writeAuditLog is audit.Record (https://github.com/go-gitea/gitea/pull/24257/files#diff-94a4fe91fe50f09ce11cb56e36faf060259104c884de9e699a7290fac8ce9c76R118) and multiple appenders are called then by the queue. There is no DB appender at the moment but the file appender.

KN4CK3R avatar Sep 19 '23 16:09 KN4CK3R