bitmagnet icon indicating copy to clipboard operation
bitmagnet copied to clipboard

Allow setting a database size limit, suspend DHT crawler if exceeded

Open nodiscc opened this issue 1 year ago • 11 comments

  • [x] I have checked the existing issues to avoid duplicates
  • [x] I have redacted any info hashes and content metadata from any logs or screenshots attached to this issue

Is your feature request related to a problem? Please describe

Since the database size will tend to grow indefinitely as more and more torrents are indexed (does it ever stop? once you've indexed the whole DHT? :thinking: ), I would like to set a limit in the configuration file, after which bitmagnet would stop crawling, and just emit a message in logs on start/every few minutes.

Describe the solution you'd like

dht_crawler:
  db_size_limit: 53687091200 # 50GiB in bytes, could also use human-friendly format 50G
[INFO] DHT crawler suspended, configured maximum database size of 53687091200 bytes reached

Describe alternatives you've considered

Manually disabling the DHT crawler component --keys=http_server in my systemd .service file once I get low on disk space. However a configuration setting would be more "set-and-forget" and could prevent running low on space in the first place (after which manual trimming of the database would be needed to reclaim some disk space, I guess).

Additional context

Related to https://github.com/bitmagnet-io/bitmagnet/issues/186 and https://github.com/bitmagnet-io/bitmagnet/issues/70

nodiscc avatar Feb 29 '24 20:02 nodiscc

Should get an aggressive notification somewhere that you've hit such a catastrophic failure, I guess that could be done in Grafana as a default

Technetium1 avatar Mar 01 '24 05:03 Technetium1

catastrophic failure

It's not, it's normal operation (respecting configured limits). This deserves an INFO level log message, at worst a WARNING

nodiscc avatar Mar 01 '24 07:03 nodiscc

It should absolutely be a warning, as that's a serious problem! If you run out of space, there's potential for logging to fail, among other "catastrophic" things.

Technetium1 avatar Mar 01 '24 23:03 Technetium1

We misunderstand each other.

Hitting the maximum configured database size is fine, it needs an INFO message so that the user/admin is not left wondering "why is it not indexing anymore".

Running out of disk space is bad, but that's not bitmagnet's job to warn you about it, that's a job for your monitoring software. Anyway setting a db size limit in bitmagnet would help preventing this.

nodiscc avatar Mar 02 '24 08:03 nodiscc

Rather than just suspending the DHT crawler, it would be nice to instead start purging old (by some definition of "old") resources from the collection. Assuming that this is technically feasible, it would be similar to a rolling log and can only grow so large before old messages/files are removed. Not to mention, I would expect that there is a inverse relationship between the age of an item in the DHT and the health of the resource.

akmad avatar Mar 07 '24 18:03 akmad

Just suspending the DHT crawler is fine. Let's not overcomplicate this

old messages/files are removed

I don't expect the application to start losing data without manual intervention. As I said:

manual trimming of the database would be needed to reclaim some disk space, I guess

Database cleanup possibilities can be discussed in another issue.

nodiscc avatar Mar 07 '24 19:03 nodiscc

Could this be done by separating the dht_crawler worker to a separate Docker service, with a custom healthcheck that fails if the size limit is exceeded?

mgdigital avatar Mar 10 '24 12:03 mgdigital

separate Docker service

I don't use Docker, I run the binary as a systemd service, see ansible role here. I could hack together a script that checks the db size, and restarts bitmagnet without the dht_crawler if a certain size is exceeded but... I think the check should be in the application rather than depend on an external mechanism, which feels like a hack.

nodiscc avatar Mar 10 '24 13:03 nodiscc

I guess my concern is any internal behaviour around this could get complex. It isn't currently supported to start/stop/restart individual workers without stopping the whole process - and I don't know if it needs to be, given that some external tool (Docker, Ansible whatever) would be capable of doing this. Database size can go down as well as up, and having this behaviour trigger during something like a migration, where disk usage spikes and then is recovered, could have unintended effects.

a configuration setting would be more "set-and-forget"

I don't know that this could ever be a "set and forget" thing as the worker won't be able to either resolve the disk space issue, or restart itself, so it will require intervention unless you intend it to be stopped forever once a threshold is reached.

Running out of disk space is bad, but that's not bitmagnet's job to warn you about it, that's a job for your monitoring software.

I agree with this - I think some monitoring software (or script) could also capably handle the stopping of the process?

I could hack together a script that checks the db size, and restarts bitmagnet without the dht_crawler if a certain size is exceeded

If going down this route I'd probably separate the crawler to its own process that can be started/stopped independently.

I think I'd need convincing that this use case was enough in-demand, and that some external orchestration would be sub-optimal, to require something implemented internally (even after other upcoming disk space mitigations have been implemented, see below).

As an aside, I'm currently building a custom workflow feature that (among other things) would allow you to auto-delete any torrent based on custom rules. It's not the feature you've described here but it will certainly help with keeping the DB size more manageable by deleting torrents that are stale or that you're not interested in.

mgdigital avatar Mar 10 '24 14:03 mgdigital

It isn't currently supported to start/stop/restart individual workers without stopping the whole process

Thanks, that limits our possibilities indeed.

I think some monitoring software (or script) could also capably handle the stopping of the process? If going down this route I'd probably separate the crawler to its own process that can be started/stopped independently.

I will try to get this working and post the results.

Although I still argue that there should be some way to suspend the DHT crawler (possibly without actually stopping the worker, just have it idle and poll the db size every 60s...) given bitmagnet's tendency to eat up disk space indefinitely (other services with a similar behavior, such as elasticseach will actively avoid storing new data when a certain disk usage threshold is reached)

nodiscc avatar Mar 13 '24 23:03 nodiscc

It isn't currently supported to start/stop/restart individual workers without stopping the whole process

Thanks, that limits our possibilities indeed.

Not to say it never could, I just think we'd need clear use cases and well-defined behaviour. I like the model of running individual workers that can simply be terminated if required for any externally determined reason, unless there's a good reason not to do it this way (partly because for the moment at least, it keeps things simple).

mgdigital avatar Mar 13 '24 23:03 mgdigital