opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
prometheusremotewrite: release lock when wal empty
Description: <Describe what has changed.>
Previously, the wal tailing routine held an exclusive lock over the entire wal, while waiting for new sections to be written.
This led to a deadlock situtation, as a locked wal naturally can not be written to.
To remedy this, the lock is now only held during actual read attempts, not in between.
Furthermore, inotify watching of the wal dir has been replaced with channel-based signaling between the producers and the tailing routine, so that the latter blocks until any new writes have happened.
Link to tracking Issue:
Fixes https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/19363 Fixes https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/15277
Testing:
An "end to end" test has been added where the WAL is first initialized without any initial data, so it goes into waiting mode. Writes are performed afterwards and the read-back result is verified.
This previously led to a deadlock and now no longer does.
It has also been confirmed that goroutines no longer build up and actual http requests are now being made.
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: sh0rez / name: sh0rez (45fa20d0508257a5fb5632645e16b7d35f39ef47, 9dedddda01b07ae7d37c393e700ed2242e34851a, 0633065007376670bd4c2799ee8b57b05bf6aa93, f2040b62b76fb576648a02460e2ff5bc6cb48e52, 0c959a79a826c4c13ff1c093ffab2c881b6ea4aa, b548215aa950653aa28636dffb7d0b7e87f104e1, b8a476031c8534b23a3609035f7643c851f4e5e7, ff35276eacb0de0764d3a209f31a7bb1c7739108, b049dee7ecbc13a8cb8912944f69fb3afb9f899e)
Whats the state here? :) ping @sh0rez
Looks like the checks hung, re-opening to trigger the checks
This PR was marked stale due to lack of activity. It will be closed in 14 days.
@codeboten any chance you could take another look?
This PR was marked stale due to lack of activity. It will be closed in 14 days.
ping @sh0rez
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.
@frzifus @Aneurysm9 @codeboten somehow dropped the ball on this, but it's still relevant.
rebased and addressed review comments, please take another look!
There seems to be linting failures here. I would love to have an approval from a code owner as well: @Aneurysm9 @rapphil
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.