AdGuardHome
AdGuardHome copied to clipboard
Reconsider the query log format
The current way in which we handle query logs has several issues:
it's racy, files can get pretty large, the search algorithm is
too complex, and we can only do basic filtering and very little
aggregation. But we also don't have a lot
of alternatives. Ideally we would like to have something
like an SQL database, à la SQLite, but that
would require cgo
, which comes with its own bag
of issues. Package
modernc.org/sqlite
seems interesting, but it's a work-in-progress and currently
doesn't support all our platforms.
I will update the issue when we'll think of something better. Perhaps SQLite is not completely out of the question if we find a way to cross-compile it without too much pain for the developers.
Package modernc.org/ql
is also an interesting alternative.
Tbh, there are a couple of advantages provided by using simple text-based storage:
- Logs can be easily rotated (not like now, though, but anyway)
- Logs can be loaded and analyzed by other software (ELK, Splunk, etc)
They should also be considered when we choose what to do next.
Perhaps consider using Blevesearch to index BoltDB to enhance queries (more filters) and speed up queries.
Blevesearch is a native Golang application.
However, it still does not help with write performance and consumes more disk space.
Using TSDB may help performance, but it can add significant complexity.
@ameshkov,
Having benchmarked modernc.org/ql
on a Linux/ARMv7 machine, single writes are somewhat slow, circa a few milliseconds, but when you use prepared statements and commit many data changes in a single transaction—the way the author of the database says is the proper way to use it—, any single one of them only takes about 20 µs or so, which is quite good, I think. Especially considering that we already have an in-memory buffer in the current implementation of our query log.
We'll still need to make a lot of testing, obviously, and the architecture needs to be thought out carefully, but having an actual, proper, pure Go database that allows GROUP BY
, count
, etc. would be great for the kinds of statistics we could show.
идея с elk очень интересна, также в мыслях крутится mysql... было бы не плохо иметь пару серверов с единой точкой хранения истории. Хочется инфраструктуру вида waf/haproxy->adguard, но если за балансировщиком пара серверов, то история обращений клиентов у каждого инстанса adguard своя. я пробовал просто мапить одну папку work на оба контейнера, но иногда возникает ситуация, когда статистика теряется (возможно одновременный доступ к файлу на запись, не пробовал дебажить). Иметь 2 инстанса с единым конфигом (это работает нормально) и единой историей запросов было бы очень удобно.
hi there! i had a question about this issue
i've been working on this personally, since i want it as a feature for myself.
my issue comes with the query log file being giant. i have ~0.5 - 1mm DNS requests per day and the file is pretty big. i don't really care about the searching, and am trying to maximize storage space while keeping a similar feature set to what currently exists
not that it will necessarily change what i'm doing for myself, but do you guys care if the storage format is human readable (since its just json rn)? it looks like you're considering sql-like interfaces (i guess mostly for searching), so it seems not, but i wanted to check in
thanks!
@gfxlabs, the most likely approach that we will take will involve some sort of binary database storage. Possibly with an optional plain-text JSON stream, but we're still thinking about that.
I don't know if that would bring any space savings, as we're still fairly far from doing any meaningful benchmarking, but it will definitely simplify keeping the query log at a particular size. I.e. removing older records more eagerly.
@ainar-g I am currently taking interest in log aggregation via Telegraf, InfluxDB, Grafana Loki, etc with other network devices and service and it currently works well because log format is in json and it is relatively easy for other tools to parse it.
While logs have room for improvement as you and others stated, I would like to add my voice that the final design approach for logs should maintain (or improve) compatibility with external log aggregation/visualization tools and platforms (telegraf, Loki, etc). May be that can be done via a configuration option (where user can choose log format of a db, json, etc) or via DB that offers a way for client tool to query and push records to an external platform.
@ainar-g I am currently taking interest in log aggregation via Telegraf, InfluxDB, Grafana Loki, etc with other network devices and service and it currently works well because log format is in json and it is relatively easy for other tools to parse it.
While logs have room for improvement as you and others stated, I would like to add my voice that the final design approach for logs should maintain (or improve) compatibility with external log aggregation/visualization tools and platforms (telegraf, Loki, etc). May be that can be done via a configuration option (where user can choose log format of a db, json, etc) or via DB that offers a way for client tool to query and push records to an external platform.
@ainar-g I think this is a good suggestion. I think we can refactor the querylog module and support pushing the DNS record to other services.
And I think the modernc.org/sqlite is enough to use it as querylog backend.
I'm glad to handle this action(it's important for me
@bnkas, @Zheaoli, I understand your request, but please keep the topic of discussion in this issue related to the on-disk format. Collection of logs by external services is a broad and deep topic, and, if I recall correctly, there is already a feature request about that. See, for example, #3389.
This might be of interest to you
Sqinn-Go is a Go (Golang) library for accessing SQLite databases in pure Go. It uses Sqinn https://github.com/cvilsmeier/sqinn under the hood. It starts Sqinn as a child process (os/exec) and communicates with Sqinn over stdin/stdout/stderr. The Sqinn child process then does the SQLite work.
If you want SQLite but do not want cgo, Sqinn-Go can be a solution.
https://github.com/cvilsmeier/sqinn-go
Also this https://gitlab.com/cznic/sqlite
some benchmarks: https://datastation.multiprocess.io/blog/2022-05-12-sqlite-in-go-with-and-without-cgo.html
source: https://news.ycombinator.com/item?id=37770454
Huh, looks interesting
On the other hand the pure Go sqlite port is getting some traction and being very actively developed. @ainar-g argued that we should migrate to it and I am starting to think that it sounds very reasonably.
https://gitlab.com/cznic/sqlite
Using the Go SQLite port is still my position as well, although they don't currently support all platforms that AGH supports, which may be an issue.
chaiSQL seems also an interesting choice when production ready https://github.com/chaisql/chai
Has anything happened here? Its been years this has been talked about and writing to a json file is not a modern solution. I have had several crashed of adguard in my environment in a short period of time. I have disabled all logging and statistics in the meantime but having a reputable option for storing data would be nice.
I agree with that. Refactor with db-based query and storage method is much of importance. Especially as a DNS server, the amount of log data is always very large, which resulting in a serious slowdown of log speeds
From a minimalist point of view... being able to send the query logs to any syslog server would be straightforward I suppose. With the query log in syslog there will be a plethora of tooling available to get data into promtail/loki, influxDB, elastic and many many others.