vector icon indicating copy to clipboard operation
vector copied to clipboard

feat(azure_logs_ingestion sink): Initial `azure_logs_ingestion` sink

Open jlaundry opened this issue 8 months ago • 27 comments

Summary

The current azure_monitor_logs sink uses the Data Collector API, which has been deprecated and will be removed in September 2026.

This sink uses the replacement Logs Ingestion API.

While I did consider making this a drop-in replacement for the existing sink, users need to make numerous breaking infrastructure changes, including:

  • Creating new Data Collection Endpoint and Data Collection Rule resources
  • Moving from a workspace-based secret key to an OAuth credential (App Registration, Managed Identity, etc.)
  • (optionally) Re-configuring logs to use the built-in tables, instead of _CL custom tables.

Change Type

  • [ ] Bug fix
  • [X] New feature
  • [ ] Non-functional (chore, refactoring, docs)
  • [ ] Performance

Is this a breaking change?

  • [ ] Yes
  • [X] No

How did you test this PR?

  1. Following the Tutorial steps, create a Log Analytics workspace, App Registration, Data Collection Endpoint, and Data Collection Rule
  2. Set the AZURE_TENANT_ID, AZURE_CLIENT_ID, and AZURE_CLIENT_SECRET environment variables from the App Registration
  3. Use the following vector.yaml:

sources:
  stdin:
    type: stdin

sinks:
  azure:
    type: azure_logs_ingestion
    inputs:
      - stdin
    endpoint: https://dce-e42z.westus2-1.ingest.monitor.azure.com
    dcr_immutable_id: dcr-00000000000000000000000000000000
    stream_name: Custom-vector_CL

Does this PR include user facing changes?

  • [X] Yes. Please add a changelog fragment based on our guidelines.
  • [ ] No. A maintainer will apply the "no-changelog" label to this PR.

References

  • Closes: https://github.com/vectordotdev/vector/issues/20978
  • Mentioned in: https://github.com/vectordotdev/vector/issues/20625 - while this PR doesn't resolve this issue for azure_blob, by using the Azure Identity crate, this sink supports passwordless credentials.

jlaundry avatar Apr 20 '25 23:04 jlaundry

CLA assistant check
All committers have signed the CLA.

bits-bot avatar Apr 20 '25 23:04 bits-bot

We will need some documentation files. See an example here (all files under website). Note that base/ is generated by make generate-component-docs.

Apologies, I thought this page was auto-generated as well - added now.

Is the intention to complete replace the azure_monitor_logs sink? If that's the case maybe we can mark the existing one as deprecated in favor of this new sink.

Good point, added 🙂

jlaundry avatar Apr 22 '25 20:04 jlaundry

Hi @jlaundry, we received this report https://github.com/vectordotdev/vector/issues/23036 and we will be reverted to older azure_* crate versions. Does this affect your PR?

pront avatar May 13 '25 18:05 pront

Hi @jlaundry, we received this report #23036 and we will be reverted to older azure_* crate versions. Does this affect your PR?

From memory, there is a minor refactor required in https://github.com/jlaundry/vector/blob/02562be6447af36404d8b5668434e317c87a45b2/src/sinks/azure_logs_ingestion/config.rs#L139 to change it back to azure_identity::create_default_credential()?;, and possibly subsequent type changes... but reverting to 0.17 or 0.19 won't fundamentally block this PR, as thankfully I was using the raw REST API 🙂

Probably easiest if you rollback the package first, and then I'll rebase, retest, and push.

jlaundry avatar May 14 '25 01:05 jlaundry

FYI - Reverted deps https://github.com/vectordotdev/vector/pull/23039

pront avatar May 14 '25 20:05 pront

FYI, I started preparing the rebase, but I've seen that the Azure Rust team have recently decided to change how authentication works with the azure_identity SDK: https://github.com/Azure/azure-sdk-for-rust/issues/2283

Depending on what they decide, we may need to explicitly configure credentials, either via the vector config file or environment variables. The azure_blob sink will need a similar change (unless using connection_string config).

So instead of releasing an initial sink, and then requiring another config/environment change, I'll wait until there's stability in the SDK before moving forward with this PR.

(I'm not giving up!!!)

jlaundry avatar May 23 '25 22:05 jlaundry

Hi @jlaundry, Thanks a lot for your initiative! In the company I work in we need this sink exactly. I went over your PR and I haven't seen explicit auth fields (like connection_string for AzureBlobStorage or shared_key for AzureMonitorLogs). Are they implicit in some way? Or are client secrets not supported for authentication? I'm asking because our use case is running vector on an AWS machine and connecting to multiple Azure sinks, so if only AAD or ENV_VAR based authentication is currently supported, we wouldn't be able to use it. Do you consider adding the explicit possibility to use client secrets per sink? I'd be happy to contribute to that.

By the way, I see that the discussion here is closed. Does it mean you will go forward with merging your PR?

Thanks a lot! Joel

yoelk avatar May 29 '25 09:05 yoelk

So instead of releasing an initial sink, and then requiring another config/environment change, I'll wait until there's stability in the SDK before moving forward with this PR.

Makes sense. I feel that these crates are unfortunately a bit unstable and each version update is risky. Thank you for your interest in contributing 👍

pront avatar Jun 11 '25 17:06 pront

Hello @jlaundry, @pront Is the above-mentioned blocking point still applicable? I don't have a lot of experiences in rust but I would be interested in working on it

Renizmy avatar Jul 08 '25 15:07 Renizmy

Hi all, currently the azure_* crates are not in a good state. Coincidentally, @thomasqueirozb was looking at this issue today. He will comment on this PR if we have a solution.

pront avatar Jul 09 '25 20:07 pront

For those playing along at home (hi @yoelk @Renizmy), a summary of the current issues and why we're blocked:

  • Vector's azure_blob sink currently uses the azure_storage and azure_storage_blobs crates, which are deprecated/legacy/EOL. The last version released was 0.21.0, which aligns to the 0.21.0 azure_core and azure_identity crates.
  • The proposed replacement azure_storage_blob crate is a ground-up re-implementation, currently in it's infancy; Microsoft have a big scary warning that there are bugs, and this crate must not be used in production.
  • In addition, the azure_core and azure_identity 0.22.0 crates changed the Traits of various components, and refactored the project structure, making the updated crates incompatible with the last azure_storage out of the box.
  • I've seen other projects in the same boat do things like compatibility shims to use azure_storage 0.21.0 with more recent azure_core (which is what @thomasqueirozb is working on in https://github.com/vectordotdev/vector/pull/23351)... which works, but adds technical debt to each project. The azure_storage crate will need to be removed eventually.
  • But, the bigger issue: for those unfamiliar with Azure workloads, there are a multitude of different ways to get credentials, depending on the deployment type and usage requirements (Managed Identities, Workload Identities, Azure CLI, certificate files... all the way down to good old OAuth Client ID & Secret). Usually, these are abstracted by the language's SDK through the DefaultAzureCredential class.
    • The Go SDK and Python SDK documentation have better descriptions and examples of how this works if you're interested
  • Starting with azure_identity 0.22.0, the Microsoft team decided to make DefaultAzureCredential different for the Rust SDK, and only use development credentials, for unspecified security reasons. While a ChainedTokenCredential was proposed (again, similar to the pattern established in the Go/Python/.NET/JavaScript SDKs), this was also removed.
  • The net result is that Rust projects that upgrade to azure_identity >= 0.22.0 will need to explicitly add configuration for the user to specify what credential type they're intending to use, and then implement a switch to instantiate the appropriate Credential - otherwise, current production deployments that use Managed/Workload Identities or AZURE_* environment variables will just stop working.
  • ... and this morning, I see that Microsoft are considering re-designing the identity library, using cross-compiled .NET code (🫤), so more turbulence is on the horizon...

What I think this means for this PR, and my thoughts/opinions for the wider project:

  1. Upgrading azure_identity in #23351 will break current users of the azure_blob sink unless they are using a connection_string. Given that this is on the critical path to update Vector to use http 1.x, this is probably still worth doing - but existing users will need to migrate their config to use a connection string.
  2. Once #23351 has been merged, I can then restart development on this PR, and I'll spend some time designing some reusable config options for selecting the appropriate Credential.
  3. Finally, once the azure_storage_blob crate reaches production stability, that's probably the point to migrate the azure_blobs sink, and as part of that introduce the various identity config options.

Also note: The existing azure_monitor_logs sink is unaffected by all this drama, because it (only) uses a shared key credential. However, the upstream API is still going to be deprecated September 2026.

jlaundry avatar Jul 11 '25 22:07 jlaundry

  1. Upgrading azure_identity in #23351 will break current users of the azure_blob sink unless they are using a connection_string. Given that this is on the critical path to update Vector to use http 1.x, this is probably still worth doing - but existing users will need to migrate their config to use a connection string.

Hi @jlaundry, I missed the context on this one. How does this break existing users? E.g. assume we keep the connection_string and go ahead with that PR. The old configs will still load. Are you saying that it will break in production? Making connection_string mandatory has other benefits so we will probably go ahead with making it mandatory.

pront avatar Sep 04 '25 19:09 pront

Hi @jlaundry, I missed the context on this one. How does this break existing users? Making connection_string mandatory has other benefits so we will probably go ahead with making it mandatory.

@pront the current documented behavior of the azure_blob sink is that if the storage_account is specified, it will attempts to load credentials for the account in the following ways, in order:

This is based on the azure_identity <= 0.21.0 behavior of DefaultAzureCredential. Once upgraded, and unless we create our own Credential wrapper struct that re-implements the old behavior, this will change to:

Based on past experience with customers, I expect 60-70% of production users are using connection_string, and among the remaining it's evenly split between environment variables and Managed Identities - but there's no real way to know for sure until it breaks. I don't think anyone has a valid use case for using an az CLI identity outside development environments.

So yes, I support removing the storage_account field, and forcing everyone to use connection_string until we have a patterned for using environment variables and Managed Identities.

jlaundry avatar Sep 04 '25 20:09 jlaundry

FYI @jlaundry: this https://github.com/vectordotdev/vector/pull/23351 was merged

pront avatar Sep 09 '25 16:09 pront

Very nice work. Looking forward to seeing this upstream. Right now I am using Logstash with the microsoft-sentinel-log-analytics-logstash-output-plugin plugin as a temporary solution.

sb1-nicolai avatar Sep 30 '25 15:09 sb1-nicolai

Hi.

Can we expect this to be included in the next release of Vector?

sb1-nicolai avatar Nov 04 '25 10:11 sb1-nicolai

Hi.

Can we expect this to be included in the next release of Vector?

Hi @sb1-nicolai, this depends on @jlaundry and the community. The Vector team is not actively working on this PR.

pront avatar Nov 04 '25 14:11 pront

While I am still keen to finish this feature in time for the older API deprecation, I don't want to load the project with an unmanageable/unsupported mess. Unfortunately, not much has changed since I wrote https://github.com/vectordotdev/vector/pull/22912#issuecomment-3064019563

The azure_storage_blob crate still doesn't have feature parity with azure_storage_blobs, and the Microsoft Rust team aren't making their roadmap clear. Other projects are continuing to vendor the old azure_storage_blobs crate with shims.

And on the azure_identity side, it's also unclear if they're moving ahead with their (horrible, IMHO) plan to wedge .NET cross-compiled code in, and it appears the product group are resisting adopting the AZURE_TOKEN_CREDENTIALS convention that they've established with the other languages.

@sb1-nicolai if Azure Log Ingestion is important to you and your team, may I please suggest you reach out to your Microsoft CSAM, and ask them to escalate to the product group.

Otherwise... Vector has fantastic support for other cloud logging platforms... 😉

jlaundry avatar Nov 05 '25 01:11 jlaundry

Hi @jlaundry

Following the discussion about AZURE_TOKEN_CREDENTIALS and per-sink authentication:

I see there's interest in using AZURE_TOKEN_CREDENTIALS to control credential chain behavior, but I'm struggling to understand how this would be compatible with per-sink authentication Fo example:

  • If we set AZURE_TOKEN_CREDENTIALS="prod", ALL sinks will use the prod chain (Environment → WorkloadIdentity → ManagedIdentity)
  • If we set AZURE_TOKEN_CREDENTIALS="ManagedIdentityCredential", ALL sinks will use only ManagedIdentity
  • I don't see a way to have Sink A use one credential type and Sink B use another

This creates an incompatibility with @yoelk's requirement for different authentication per sink

Thoughts?

Renizmy avatar Nov 21 '25 08:11 Renizmy