envoy icon indicating copy to clipboard operation
envoy copied to clipboard

[ip_tagging] Support dynamic file based ip tags

Open nezdolik opened this issue 8 months ago • 11 comments
trafficstars

Commit Message: Support async file based reloading of ip tags in ip tagging filter (done via Datasource). The supported file formats for ip tags are yaml and json. One instance of loaded ip tags file exists per process, the bookkeeping logic is handled by custom singleton that hashes and caches loaded file instances based on ip tags file path. Additional Description: Risk Level: Medium (Can be low if guarded by runtime flag) Testing: Unit + integration tests Docs Changes: Done Release Notes: TBD Platform Specific Features: NA

nezdolik avatar Feb 26 '25 15:02 nezdolik

As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/38574 was opened by nezdolik.

see: more, trace.

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/). envoyproxy/api-shepherds assignee is @mattklein123 CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/38574 was opened by nezdolik.

see: more, trace.

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/38574 was synchronize by nezdolik.

see: more, trace.

CI failure looks legit. /wait

kyessenov avatar May 13 '25 14:05 kyessenov

cc @ravenblackx @adisuissa

nezdolik avatar May 15 '25 15:05 nezdolik

I can look once CI is green?

/wait

mattklein123 avatar May 19 '25 16:05 mattklein123

@mattklein123 @adisuissa the ci has been fixed, now ready for review, ptal.

nezdolik avatar May 30 '25 08:05 nezdolik

Just wanted to clarify one moment with proposed implementation, while loading of file happens on main thread via datasource, there is a custom thread that performs periodic re-parsing of that file content into a trie like structure. One may wonder why is it running on custom thread? Am writing this feature for Docker use case, where ip tags file contains ~66k entries to parse and I believe main thread should be doing housekeeping work and not performing extension heavy computational tasks. cc @wbpcode @adisuissa @yanavlasov

nezdolik avatar Jun 11 '25 13:06 nezdolik

Just wanted to clarify one moment with proposed implementation, while loading of file happens on main thread via datasource, there is a custom thread that performs periodic re-parsing of that file content into a trie like structure. One may wonder why is it running on custom thread? Am writing this feature for Docker use case, where ip tags file contains ~66k entries to parse and I believe main thread should be doing housekeeping work and not performing extension heavy computational tasks. cc @wbpcode @adisuissa @yanavlasov

Thanks, it makes sense.

yanavlasov avatar Jun 12 '25 15:06 yanavlasov

/docs

phlax avatar Jun 17 '25 11:06 phlax

Docs for this Pull Request will be rendered here:

https://storage.googleapis.com/envoy-pr/38574/docs/index.html

The docs are (re-)rendered each time the CI envoy-presubmit (precheck docs) job completes.

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/38574#issuecomment-2979978149 was created by @phlax.

see: more, trace.

Managed to repro segfault in integration test (faster feedback than building and deploying custom images). Unfortunately I will not be able to run parsing on custom thread. Custom thread needs to access data source and stats data via TLS and TLS implementation has built in assertions for which thread is registering for & accessing the thread local data (it does not allow custom threads).

nezdolik avatar Jun 23 '25 14:06 nezdolik

Managed to repro segfault in integration test (faster feedback than building and deploying custom images). Unfortunately I will not be able to run parsing on custom thread. Custom thread needs to access data source and stats data via TLS and TLS implementation has built in assertions for which thread is registering for & accessing the thread local data (it does not allow custom threads).

You can post a message from your custom thread to the main thread to update TLS

yanavlasov avatar Jun 23 '25 15:06 yanavlasov

@yanavlasov with custom thread we will get chained dispatchers invocations:

  ip_tags_reload_timer_ = main_dispatcher.createTimer([this]() -> void {
    //reading data from datasource can only be done on main or worker thread
    const auto new_data = tags_loader_.getDataSourceData();
    ip_tags_reload_dispatcher.post([this, &new_data]{
    auto new_tags_or_error = tags_loader_.refreshTags(new_data);
    if (new_tags_or_error.status().ok()) {
      updateIpTags(new_tags_or_error.value());
      // stats update can only run on main or worker thread
     main_dispatcher.post([]{
      //updates reload stats
     reload_success_cb_();
});
 
    } else {
      ENVOY_LOG(debug, "Failed to reload ip tags, using old data: {}",
                new_tags_or_error.status().message());
      // stats update can only run on main or worker thread
         main_dispatcher.post([]{
      //updates reload stats
     reload_error_cb_();
});
    }
});

    ip_tags_reload_timer_->enableTimer(ip_tags_refresh_interval_ms_);
  });

nezdolik avatar Jun 23 '25 16:06 nezdolik

@yanavlasov this is ready for another round. Lmk if you would still prefer to run on custom thread. Performance-wise it makes sense just makes code much more complex.

nezdolik avatar Jun 23 '25 17:06 nezdolik

The change looks good to me. Needs CI to be stabilized and API review.

/wait

yanavlasov avatar Jun 27 '25 20:06 yanavlasov

just a quick note, i have been on a 2 weeks summer vacation, coming back to this next week.

nezdolik avatar Jul 16 '25 22:07 nezdolik

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Aug 16 '25 00:08 github-actions[bot]

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Aug 23 '25 04:08 github-actions[bot]

Bunch of fixes in latest commit:

  • Disarming of timer on geoip provider destruction
  • Decoupling provider and filterConfig classes, since instances of such classes have different life time:
    • got rid of provider -> filterConfig callbacks for incrementing reload stats, on LDS update filterConfig will be destroyed and callback references will be invalid (unless callbacks are properly posted to filterConfig dispatcher). Moved tracking of such stats to provider instead.
    • provider is no longer holding reference to loader (owned by filterConfig), but instead creates its own instance of loader to avoid segfaults on LDS updates.
    • creation and tracking of hit metrics is moved to filter onHeader method. There before incrementing each hit stat we check is such counter exists. This is to accommodate for case when tags have been reloaded in background and some new counters may need to be registered. This is a simpler approach, which may introduce overhead in request path, but in reality is just one extra check in a map for counter existence. Another approach would have been to create and pass filterConfig callbacks to provider on each filterConfig creation/LDS update and then invoke the callbacks via thread dispatcher and register new counters. For now leaned towards simpler approach.

nezdolik avatar Sep 11 '25 20:09 nezdolik

@wbpcode would you do another pass on the API?

nezdolik avatar Sep 14 '25 13:09 nezdolik

@wbpcode @yanavlasov addressed review comments.

nezdolik avatar Sep 22 '25 12:09 nezdolik

I think we get agreement on the refactor to the DataSourceProvider because I got a 👍 . It's will be a big improvement because almost all DataSource based configuration could benefit from it.

/wait

wbpcode avatar Sep 28 '25 03:09 wbpcode

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Oct 28 '25 04:10 github-actions[bot]

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Nov 04 '25 04:11 github-actions[bot]

I think we get agreement on the refactor to the DataSourceProvider because I got a 👍 . It's will be a big improvement because almost all DataSource based configuration could benefit from it.

/wait

https://github.com/envoyproxy/envoy/pull/41961

nezdolik avatar Nov 11 '25 12:11 nezdolik