opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

prometheusremotewrite: release lock when wal empty

Open tombrk opened this issue 2 years ago • 12 comments
trafficstars

Description: <Describe what has changed.>

Previously, the wal tailing routine held an exclusive lock over the entire wal, while waiting for new sections to be written.

This led to a deadlock situtation, as a locked wal naturally can not be written to.

To remedy this, the lock is now only held during actual read attempts, not in between.

Furthermore, inotify watching of the wal dir has been replaced with channel-based signaling between the producers and the tailing routine, so that the latter blocks until any new writes have happened.

Link to tracking Issue:

Fixes https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/19363 Fixes https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/15277

Testing:

An "end to end" test has been added where the WAL is first initialized without any initial data, so it goes into waiting mode. Writes are performed afterwards and the read-back result is verified.

This previously led to a deadlock and now no longer does.

It has also been confirmed that goroutines no longer build up and actual http requests are now being made.

tombrk avatar Apr 12 '23 15:04 tombrk

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: sh0rez / name: sh0rez (45fa20d0508257a5fb5632645e16b7d35f39ef47, 9dedddda01b07ae7d37c393e700ed2242e34851a, 0633065007376670bd4c2799ee8b57b05bf6aa93, f2040b62b76fb576648a02460e2ff5bc6cb48e52, 0c959a79a826c4c13ff1c093ffab2c881b6ea4aa, b548215aa950653aa28636dffb7d0b7e87f104e1, b8a476031c8534b23a3609035f7643c851f4e5e7, ff35276eacb0de0764d3a209f31a7bb1c7739108, b049dee7ecbc13a8cb8912944f69fb3afb9f899e)

Whats the state here? :) ping @sh0rez

frzifus avatar May 24 '23 16:05 frzifus

Looks like the checks hung, re-opening to trigger the checks

codeboten avatar May 31 '23 16:05 codeboten

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jun 20 '23 05:06 github-actions[bot]

@codeboten any chance you could take another look?

tombrk avatar Jul 04 '23 14:07 tombrk

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jul 20 '23 05:07 github-actions[bot]

ping @sh0rez

frzifus avatar Jul 20 '23 08:07 frzifus

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Aug 04 '23 05:08 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Aug 24 '23 05:08 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Sep 09 '23 05:09 github-actions[bot]

@frzifus @Aneurysm9 @codeboten somehow dropped the ball on this, but it's still relevant.

rebased and addressed review comments, please take another look!

tombrk avatar May 02 '24 15:05 tombrk

There seems to be linting failures here. I would love to have an approval from a code owner as well: @Aneurysm9 @rapphil

jpkrohling avatar May 03 '24 07:05 jpkrohling

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar May 23 '24 05:05 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jun 07 '24 05:06 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jun 22 '24 05:06 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jul 07 '24 05:07 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Jul 22 '24 05:07 github-actions[bot]