MailKit icon indicating copy to clipboard operation
MailKit copied to clipboard

Add Activity/OpenTelemetry support

Open pengweiqhca opened this issue 2 years ago • 43 comments

Is it possible to wrap the calling server around the Activity? So that, I can get elapsed time per calling.

pengweiqhca avatar Jan 05 '23 10:01 pengweiqhca

You could subclass any class in MailKit to add whatever functionality you want.

I don't want to take a dependency on OpenTelemetry because most of my users won't want it.

jstedfast avatar Jan 05 '23 14:01 jstedfast

@jstedfast it doesn't require any external dependency. In-order to implement tracing and metrics in dotnet you use constructs from dotnet, namely System.Diagnostics.Metrics and System.Diagnostics.ActivitySource.

EugeneKrapivin avatar Jan 03 '24 11:01 EugeneKrapivin

@EugeneKrapivin right, but I'm talking about not wanting to add the OpenTelemetry NuGet dependency.

Maybe I'm just misunderstanding the feature request?

jstedfast avatar Jan 03 '24 15:01 jstedfast

@jstedfast To get the library instrumented (tracing, metrics, logs) you don't have to use any external dependencies as the maintainer of this library. The users of your library, if they chose to export those traces, logs and metrics will use OpenTelemetry libraries.

I'm working on a small POC, and actually using MailKit (that's how I wound up here), so I have a handy example:

In a class called SmtpSenderService I have:

public SmtpSenderService(ILogger<SmtpSenderService> logger, IMeterFactory meterFactory)
{
    _logger = logger;

    _meterFactory = meterFactory;
    _meter = _meterFactory.Create("SMTPSender"); // remember this part

    _createClientHistogram = _meter.CreateHistogram<double>("smtp_create_client", "ms");
    _connectClientHistogram = _meter.CreateHistogram<double>("smtp_connect_client", "ms");
    _sendMessageHistogram = _meter.CreateHistogram<double>("smtp_send", "ms");
    _exceptionsCounter = _meter.CreateCounter<long>("smtp_exception", "exception");
}
private static ActivitySource source = new("SmptSenderService", "1.0.0"); // remember this part

ILogger, IMeterFactory, Meter and ActivitySource are part of

using System.Diagnostics;
using System.Diagnostics.Metrics;
using Microsoft.Extensions.Logging;

namespaces, which require no external dependencies. In some SendMessage method I have the following code:

if (_client is null)
{                
    var startCreate = Stopwatch.GetTimestamp();
   
    _client ??= new SmtpClient();
    
    var startConnect = Stopwatch.GetTimestamp();
    
    using (var _ = source.StartActivity("Smtp.Connect", ActivityKind.Client))
        await _client.ConnectAsync(config.ServerUrl, int.Parse(config.ServerPort), false, cts.Token);
    
    using (var _ = source.StartActivity("Smtp.Authenticate", ActivityKind.Client))
        await _client.AuthenticateAsync(config.Username, config.Password, cts.Token);
    
    var elapsedConnect = Stopwatch.GetElapsedTime(startConnect);
    var elapsedCreate = Stopwatch.GetElapsedTime(startCreate);

    _createClientHistogram.Record(elapsedCreate.TotalMilliseconds);
    _connectClientHistogram.Record(elapsedConnect.TotalMilliseconds, new KeyValuePair<string, object?>("flow", "create"));
}

and

using var sendActivity = source.StartActivity("Smtp.Send", ActivityKind.Client);
try
{
    var r = await _client.SendAsync(message, cts.Token);
    var total = Stopwatch.GetElapsedTime(start);
    
    sendActivity?.AddTag("status", "success");
    sendActivity?.AddEvent(new ActivityEvent("email sent"));
    _sendMessageHistogram.Record(total.TotalMicroseconds, new KeyValuePair<string, object?>("status", "success"));
}
catch (SmtpCommandException ex) when (ex.ErrorCode == SmtpErrorCode.RecipientNotAccepted)
{
    _logger.LogError(ex, "failed sending an email with error {errorCode}", ex.ErrorCode);
    var total = Stopwatch.GetElapsedTime(start);

    _sendMessageHistogram.Record(total.TotalMicroseconds, new KeyValuePair<string, object?>("status", "failed"));
    _exceptionsCounter.Add(1,
            new KeyValuePair<string, object?>("exception", ex.GetType().Name),
            new KeyValuePair<string, object?>("flow", "send"));

    sendActivity?.AddTag("status", "failed");
    sendActivity?.AddTag("exception", ex.GetType().Name);
    sendActivity?.AddTag("smtp.error_code", ex.ErrorCode);

    return new(false, $"failed with {ex.ErrorCode}");
}
// the catch clauses continue, just to make sure I catch all the exceptions possible and log them properly

As I've said before, none of this code requires any external dependencies. Having this code in MailKit could really help with observability and monitoring for mailkit users.

In a different place in my service (the host part) I have my dependencies on OpenTelemetry:

builder.Services
    .AddOpenTelemetry()
    .WithTracing(tracing =>
    {
        tracing.AddSource("SmptSenderService");
    })
    .WithMetrics(metrics =>
    {
        metrics
            .AddMeter("SMTPSender");
    });

these lines make sure that when traces and metrics are exported, my metrics and traces are also exported. Once exported I have my metrics: image

and some fancy traces: image

and because logs are also enriched when exported: image

Having MailKit instrumentation out of the box would be really helpful in reducing boilerplate code for monitoring the system with traces and quality metrics.

EugeneKrapivin avatar Jan 03 '24 16:01 EugeneKrapivin

Okay, how do you propose adding this to MailKit?

jstedfast avatar Jan 03 '24 16:01 jstedfast

Well I'd suggest:

  • Metrics - wrapping IO operations in timers (histograms of TimeSpan in my code examples)
    • adding exception/error counters with appropriate tagging
  • Tracing - wrapping IO operations with Activities and Activity Events.
  • Adding logs using standard logging APIs in dotnet, which ultimately get collected and exported (processes that should not concern you as a library maintainer)

This is probably a huge ask, and I think it should be opened for discussion to see if it's worth the investment (aka will users of the library use it). For the purposes of my POC it was worth it because it gave me the data I needed. However, it wasn't a clean insert :) . There was a lot of copy-paste engineering on my part to cover all exceptions and operations (Create, Connect, NoOp, Send, Disconnect).

As a trend the dotnet community pushes toward adding instrumentation to libraries (especially IO bound libraries) so that distributed tracing (which is priceless in debugging microservices plane) could be done through OpenTelemetry which is the defacto standard for observability in virtually all languages.

just as an example, my PoC uses: ASP.net minimal APIs, KafkaFlow, Orleans, Azure Tables (for orleans storage) and MailKit. Except MailKit all the others have OOTB Tracing instrumentation, passing the tracing context between ASP.net, kafkaflow, orleans and AzureTables SDK was effortless. All I had to do is add their names to the OpenTelemetry configuration. After wrapping some MailKit calls in activities:

image

which is accompanied by a correlated log:

{
	"body": "failed sending an email with error SenderNotAccepted",
	"traceid": "b62dd4170b6c0b0da30a6dce0c0fcdf5",
	"spanid": "b37a86f71fc2c09f",
	"severity": "Error",
	"flags": 1,
	"attributes": {
		"ParentId": "e5e6b9e3c5dce273",
		"SpanId": "b37a86f71fc2c09f",
		"TraceId": "b62dd4170b6c0b0da30a6dce0c0fcdf5",
		"errorCode": "SenderNotAccepted",
		"exception.message": "Invalid sender [email protected]",
		"exception.stacktrace": "MailKit.Net.Smtp.SmtpCommandException: Invalid sender [email protected]\r\n   at MailKit.Net.Smtp.SmtpClient.SendAsync(FormatOptions options, MimeMessage message, MailboxAddress sender, IList`1 recipients, CancellationToken cancellationToken, ITransferProgress progress)\r\n   at MailKit.Net.Smtp.SmtpClient.SendAsync(FormatOptions options, MimeMessage message, MailboxAddress sender, IList`1 recipients, CancellationToken cancellationToken, ITransferProgress progress)\r\n   at Processor.Grains.SmtpSenderService.SendMessage(SmtpConfig config, EmailRequest emailRequest) in C:\\Users\\nocgod\\source\\repos\\EmailServicePoc\\Processor\\Processor.Grains\\SmtpSenderService.cs:line 138",
		"exception.type": "SmtpCommandException",
		"{OriginalFormat}": "failed sending an email with error {errorCode}"
	},
	"resources": {
		"service.instance.id": "243d4bd2-79e4-4ed1-a308-4b8d37bc7665",
		"service.name": "processor.host",
		"telemetry.sdk.language": "dotnet",
		"telemetry.sdk.name": "opentelemetry",
		"telemetry.sdk.version": "1.7.0"
	},
	"instrumentation_scope": {
		"name": "Processor.Grains.SmtpSenderService"
	}
}

EugeneKrapivin avatar Jan 03 '24 16:01 EugeneKrapivin

Well I'd suggest:

  • Metrics - wrapping IO operations in timers (histograms of TimeSpan in my code examples)
    • adding exception/error counters with appropriate tagging

Right, but how should MailKit provide this info to the app?

  • Tracing - wrapping IO operations with Activities and Activity Events.

Again, how should MailKit provide this info? via events? via some new IClientMetrics interface that can be set on any of the clients?

I'm not sure what would make the most sense.

Adding logs using standard logging APIs in dotnet, which ultimately get collected and exported (processes that should not concern you as a library maintainer)

Currently, MailKit does not depend on Microsoft.Extensions.Logging or Microsoft.Extensions.DependencyInjection, so this would be awkward.

I've been thinking about moving to DependencyInjection as an alternative to MimeKit.ParserOptions.Register() and MimeKit.Cryptography.CryptographyContext.Register() (especially with the request to make MimeKit depend on MimeKitLite rather than being more-or-less API duplicates of each other), but so far I haven't gotten around to that or even figured out exactly how I want to do that...

What do those other libraries you are using do? It sounds like they just assume you'll be logging telemetry/metrics/etc and only really leave up the configuration to the developers, but MailKit is in a bit of a different situation because it's not really designed specifically for ASP.NET apps where you can pretty much count on everyone wanting that kind of telemetry logging.

jstedfast avatar Jan 03 '24 17:01 jstedfast

Right, but how should MailKit provide this info to the app?

it should not concern the library. As long as you use System.Diagnostics.Metrics.Meter you and document properly the name of the meter, all that I need as an app developer is the name of the meter. Same goes for tracing, as long as you use activities and properly document the source name, that's enough. The following is code from my PoC:

builder.Services
    .AddOpenTelemetry()
    .WithMetrics(metrics =>
    {
        metrics
            .AddRuntimeInstrumentation()
            .AddMeter("Microsoft.Orleans")
            .AddBuiltInMeters();
    })
    .WithTracing(tracing =>
    {
        if (builder.Environment.IsDevelopment())
        {
            // We want to view all traces in development
            tracing.SetSampler(new AlwaysOnSampler());
        }

        tracing
            .AddSource("Azure.*")
            .AddSource("Microsoft.Orleans.Runtime")
            .AddSource("Microsoft.Orleans.Application")
            .AddSource(KafkaFlowInstrumentation.ActivitySourceName)
            .AddAspNetCoreInstrumentation()
            .AddGrpcClientInstrumentation()
            .AddHttpClientInstrumentation(o => {
                o.FilterHttpRequestMessage = (_) => Activity.Current?.Parent?.Source?.Name != "Azure.Core.Http";
            });
    });

Notice that Microsoft Orleans is added as a string (which they provide in their documentation) while KafkaFlow opted for releasing a little nuget with a public static readonly string to avoid magic strings. This setup code IS dependent on OpenTelemetry, however it should not concern you as a library developer.

Currently, MailKit does not depend on Microsoft.Extensions.Logging or Microsoft.Extensions.DependencyInjection, so this would be awkward.

you could expose a logging adapter and a logging abstraction like KafkaFlow does.

I've been thinking about moving to DependencyInjection as an alternative to MimeKit.ParserOptions.Register() and MimeKit.Cryptography.CryptographyContext.Register() (especially with the request to make MimeKit depend on MimeKitLite rather than being more-or-less API duplicates of each other), but so far I haven't gotten around to that or even figured out exactly how I want to do that...

I guess this will require a bigger refactoring that should probably be out-of-scope for the purposes of observability instrumentation.

What do those other libraries you are using do? It sounds like they just assume you'll be logging telemetry/metrics/etc and only really leave up the configuration to the developers, but MailKit is in a bit of a different situation because it's not really designed specifically for ASP.NET apps where you can pretty much count on everyone wanting that kind of telemetry logging.

for tracing and metrics they use standard dotnet APIs:

  • https://learn.microsoft.com/en-us/dotnet/core/diagnostics/distributed-tracing#getting-started-for-net-library-developers
  • https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics-instrumentation

which are not dependant on ASP.NET specifically, as you can use them in a plain worker service without dependency on the Web SDKs.

The logging is a design decision you could make:

  • embrace the microsoft logging API
  • abstract logging and provide an adapter to microsoft API

EugeneKrapivin avatar Jan 03 '24 17:01 EugeneKrapivin

@EugeneKrapivin can you help define which metrics (and other info) you want/need? I'll put this in the queue for a future v5.0 release but first I need to know what specific metrics developers care about having.

(Note: it seems like there is growing demand for a unification of MimeKit and MimeKitLite which is going to require me to work on a v5.0 sooner rather than later to split MimeKit into MimeKit.Core and MimeKit.Cryptography - this seems like the perfect opportunity to add in metrics & tracing data).

jstedfast avatar Jan 08 '24 17:01 jstedfast

@jstedfast sorry for the late reply. I'd add:

  • Metrics:
    • histograms to measure any operation time in the library (like SMTP connect, send, no op, disconnect, etc').
    • counters (monotonically increasing counters) for errors and exceptions with tags/labels to make sure the metrics are dissectible for the user to setup alerts in case a high error rate is detected
    • I'd add gauges to hold count of the number of active clients/connects there are, it'd also be useful to monitor run-away situations with connections
  • Traces - any request should open an activity that would make it possible to see Mailkit IO interactions in a distributed trace (think about a flow of: ASP.Net minimal APIs -> KafkaFlow -> Microsoft Orleans grain -> [ Azure storage table SDK, SMTP client] like I've pasted earlier in this thread)
  • Logs - Use microsoft ILogger logging abstraction, Which is easily configurable to be exported open telemetry parameters using standard dotnet and open telemetry libraries (exporting is not a concern of mail kit).

I'm adding some code from my POC project in which I've added some metrics, traces and logs around some simple usage of the SMTP client. Please do not take this example as best practice =) I'd suggest taking a look at the best practices from msft documentation :)

// note that the SmtpClient is not thread safe (we shouldn't be using reentrant here)
[StatelessWorker]
public class SmtpSenderService : Grain, ISmptSenderService, IDisposable
{
    private bool _disposedValue;
    private SmtpClient _client;
    private readonly ILogger<SmtpSenderService> _logger;

    private readonly IMeterFactory _meterFactory;

    private Meter _meter;
    private Histogram<double> _createClientHistogram;
    private Histogram<double> _connectClientHistogram;
    private Histogram<double> _sendMessageHistogram;
    private Counter<long> _exceptionsCounter;

    public SmtpSenderService(ILogger<SmtpSenderService> logger, IMeterFactory meterFactory)
    {
        _logger = logger;

        _meterFactory = meterFactory;
        _meter = _meterFactory.Create("SMTPSender");

        _createClientHistogram = _meter.CreateHistogram<double>("smtp_create_client", "ms");
        _connectClientHistogram = _meter.CreateHistogram<double>("smtp_connect_client", "ms");
        _sendMessageHistogram = _meter.CreateHistogram<double>("smtp_send", "ms");
        _exceptionsCounter = _meter.CreateCounter<long>("smtp_exception", "exception");
    }
    private static ActivitySource source = new("SmptSenderService", "1.0.0");

    public async Task<SendResult> SendMessage(SmtpConfig config, EmailRequest emailRequest)
    {
        var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
        try
        {
            if (_client is null)
            {                
                var startCreate = Stopwatch.GetTimestamp();
               
                _client ??= new SmtpClient();

                var startConnect = Stopwatch.GetTimestamp();
                
                using (var _ = source.StartActivity("Smtp.Connect", ActivityKind.Client))
                    await _client.ConnectAsync(config.ServerUrl, int.Parse(config.ServerPort), false, cts.Token);
                
                using (var _ = source.StartActivity("Smtp.Authenticate", ActivityKind.Client))
                    await _client.AuthenticateAsync(config.Username, config.Password, cts.Token);
                
                var elapsedConnect = Stopwatch.GetElapsedTime(startConnect);
                var elapsedCreate = Stopwatch.GetElapsedTime(startCreate);

                _createClientHistogram.Record(elapsedCreate.TotalMilliseconds);
                _connectClientHistogram.Record(elapsedConnect.TotalMilliseconds, new KeyValuePair<string, object?>("flow", "create"));
            }
            
            var reconnect = false;
            try
            {
                using var _ = source.StartActivity("Smtp.NoOp", ActivityKind.Client);
                await _client.NoOpAsync();
            }
            catch (Exception ex)
            {
                reconnect = true;
            }

            if (!_client.IsConnected || reconnect)
            {
                var startConnect = Stopwatch.GetTimestamp();
                
                using (var _ = source.StartActivity("Smtp.Connect", ActivityKind.Client))
                    await _client.ConnectAsync(config.ServerUrl, int.Parse(config.ServerPort), false, cts.Token);
                
                using (var _ = source.StartActivity("Smtp.Authenticate", ActivityKind.Client))
                    await _client.AuthenticateAsync(config.Username, config.Password, cts.Token);

                var elapsedConnect = Stopwatch.GetElapsedTime(startConnect);
                _connectClientHistogram.Record(elapsedConnect.TotalMilliseconds, new KeyValuePair<string, object?>("flow", "create"));
            }

            DelayDeactivation(TimeSpan.FromSeconds(30));
        }
        catch (AuthenticationException ex)
        {
            _exceptionsCounter.Add(1,
                new KeyValuePair<string, object?>("exception", ex.GetType().Name),
                new KeyValuePair<string, object?>("flow", "connect"));

            _logger.LogError(ex, "failed authenticating with smtp server");
        }
        catch (Exception ex) when (ex is OperationCanceledException || ex is TimeoutException)
        {
            _logger.LogError(ex, "failed sending message {messageId} due to a timeout", emailRequest.MessageId);
            _exceptionsCounter.Add(1,
                    new KeyValuePair<string, object?>("exception", ex.GetType().Name),
                    new KeyValuePair<string, object?>("flow", "connect"));
            return new(false, "timeout");
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "failed sending message {messageId} due to an exception", emailRequest.MessageId);
            _exceptionsCounter.Add(1,
                    new KeyValuePair<string, object?>("exception", ex.GetType().Name),
                    new KeyValuePair<string, object?>("flow", "connect"));
            return new(false, "exception");
        }

        var message = CreateMimeMessage(config, emailRequest);

        var start = Stopwatch.GetTimestamp();
        using var sendActivity = source.StartActivity("Smtp.Send", ActivityKind.Client);
        try
        {
            var r = await _client.SendAsync(message, cts.Token);

            var total = Stopwatch.GetElapsedTime(start);
            
            sendActivity?.AddTag("status", "success");
            sendActivity?.AddEvent(new ActivityEvent("email sent"));
            _sendMessageHistogram.Record(total.TotalMicroseconds, new KeyValuePair<string, object?>("status", "success"));
        }
        catch (SmtpCommandException ex) when (ex.ErrorCode == SmtpErrorCode.RecipientNotAccepted)
        {
            _logger.LogError(ex, "failed sending an email with error {errorCode}", ex.ErrorCode);
            var total = Stopwatch.GetElapsedTime(start);

            _sendMessageHistogram.Record(total.TotalMicroseconds, new KeyValuePair<string, object?>("status", "failed"));
            _exceptionsCounter.Add(1,
                    new KeyValuePair<string, object?>("exception", ex.GetType().Name),
                    new KeyValuePair<string, object?>("flow", "send"));

            sendActivity?.AddTag("status", "failed");
            sendActivity?.AddTag("exception", ex.GetType().Name);
            sendActivity?.AddTag("smtp.error_code", ex.ErrorCode);

            return new(false, $"failed with {ex.ErrorCode}");
        }
        catch (SmtpCommandException ex) when (ex.ErrorCode == SmtpErrorCode.SenderNotAccepted)
        {
            _logger.LogError(ex, "failed sending an email with error {errorCode}", ex.ErrorCode);
            var total = Stopwatch.GetElapsedTime(start);

            _sendMessageHistogram.Record(total.TotalMicroseconds, new KeyValuePair<string, object?>("status", "failed"));
            _exceptionsCounter.Add(1,
                new KeyValuePair<string, object?>("exception", ex.GetType().Name),
                new KeyValuePair<string, object?>("flow", "send"));

            sendActivity?.AddTag("status", "failed");
            sendActivity?.AddTag("exception", ex.GetType().Name);
            sendActivity?.AddTag("smtp.error_code", ex.ErrorCode);

            return new(false, $"failed with {ex.ErrorCode}");
        }
        catch (SmtpCommandException ex) when (ex.ErrorCode == SmtpErrorCode.MessageNotAccepted)
        {
            _logger.LogError(ex, "failed sending an email with error {errorCode}", ex.ErrorCode);
            var total = Stopwatch.GetElapsedTime(start);

            _sendMessageHistogram.Record(total.TotalMicroseconds, new KeyValuePair<string, object?>("status", "failed"));
            _exceptionsCounter.Add(1,
                new KeyValuePair<string, object?>("exception", ex.GetType().Name),
                new KeyValuePair<string, object?>("flow", "send"));

            sendActivity?.AddTag("status", "failed");
            sendActivity?.AddTag("exception", ex.GetType().Name);
            sendActivity?.AddTag("smtp.error_code", ex.ErrorCode);

            return new(false, $"failed with {ex.ErrorCode}");
        }
        catch (SmtpCommandException ex) when (ex.ErrorCode == SmtpErrorCode.UnexpectedStatusCode)
        {
            _logger.LogError(ex, "failed sending an email with error {errorCode}", ex.ErrorCode);
            var total = Stopwatch.GetElapsedTime(start);

            _sendMessageHistogram.Record(total.TotalMicroseconds, new KeyValuePair<string, object?>("status", "failed"));
            _exceptionsCounter.Add(1,
                new KeyValuePair<string, object?>("exception", ex.GetType().Name),
                new KeyValuePair<string, object?>("flow", "send"));

            sendActivity?.AddTag("status", "failed");
            sendActivity?.AddTag("exception", ex.GetType().Name);
            sendActivity?.AddTag("smtp.error_code", ex.ErrorCode);

            return new(false, $"failed with {ex.ErrorCode}");
        }
        catch (SmtpProtocolException ex)
        {
            _logger.LogError(ex, "failed sending message {messageId} due to smtp protocol exception", emailRequest.MessageId);
            var total = Stopwatch.GetElapsedTime(start);

            _sendMessageHistogram.Record(total.TotalMicroseconds, new KeyValuePair<string, object?>("status", "failed"));
            _exceptionsCounter.Add(1,
                new KeyValuePair<string, object?>("exception", ex.GetType().Name),
                new KeyValuePair<string, object?>("flow", "send"));

            sendActivity?.AddTag("status", "failed");
            sendActivity?.AddTag("exception", ex.GetType().Name);

            return new(false, ex.Message);
        }
        catch (IOException ex)
        {
            _logger.LogError(ex, "failed sending message {messageId} due to an IO exception", emailRequest.MessageId);
            var total = Stopwatch.GetElapsedTime(start);

            _sendMessageHistogram.Record(total.TotalMicroseconds);
            _exceptionsCounter.Add(1,
                new KeyValuePair<string, object?>("exception", ex.GetType().Name),
                new KeyValuePair<string, object?>("flow", "send"));

            sendActivity?.AddTag("status", "failed");
            sendActivity?.AddTag("exception", ex.GetType().Name);

            return new(false, ex.Message);
        }
        catch (Exception ex) when (ex is OperationCanceledException || ex is TimeoutException)
        {
            _logger.LogError(ex, "failed sending message {messageId} due to a timeout", emailRequest.MessageId);
            _exceptionsCounter.Add(1,
                    new KeyValuePair<string, object?>("exception", ex.GetType().Name),
                    new KeyValuePair<string, object?>("flow", "send"));

            sendActivity?.AddTag("status", "failed");
            sendActivity?.AddTag("exception", ex.GetType().Name);

            return new(false, "timeout");
        }
        finally
        {
            sendActivity?.Dispose(); // avoid nesting of Disconnect under the Send activity
            using var _ = source.StartActivity("Smtp.Disconnect", ActivityKind.Client);
            await _client.DisconnectAsync(true, cts.Token);
        }

        return new(true, "sent");
    }

    private static MimeMessage CreateMimeMessage(SmtpConfig config, EmailRequest emailRequest)
    {
        var message = new MimeMessage();
        message.From.Add(new MailboxAddress(config.FromName, config.From));
        message.To.Add(new MailboxAddress(emailRequest.RecipientName, emailRequest.Recipient));
        message.Subject = emailRequest.Subject;

        message.Body = new TextPart("plain")
        {
            Text = emailRequest.Body
        };
        return message;
    }

    protected virtual void Dispose(bool disposing)
    {
        if (!_disposedValue)
        {
            if (disposing)
            {
                _logger.LogWarning("disposing of connection");
                _client?.Dispose();
            }
            _disposedValue = true;
        }
    }

    public void Dispose()
    {
        // Do not change this code. Put cleanup code in 'Dispose(bool disposing)' method
        Dispose(disposing: true);
        GC.SuppressFinalize(this);
    }
}

EugeneKrapivin avatar Jan 24 '24 08:01 EugeneKrapivin

@EugeneKrapivin Thanks for taking the time to respond and share your ideas. I'll definitely be trying to read up on Microsoft's guides on best practices.

jstedfast avatar Jan 26 '24 14:01 jstedfast

Thanks, @jstedfast, for considering adding Telemetry support in Mailkit v5.0. All other libraries that we use to make external calls, such as HttpClient for HTTP calls, MassTransit for RabbitMQ, Stackexchange.Redis for Redis calls, and MysqlConnector for MySQL, have added support for telemetry, and we have started instrumenting them. It would be great if we could have this added in Mailkit.

prashanth-nani avatar Jan 30 '24 13:01 prashanth-nani

Ah, cool, I should look at how HttpClient does stuff as that might make be a close match with MailKit's client APIs.

jstedfast avatar Jan 30 '24 15:01 jstedfast

Some docs on dotnet's built-in networking (including HttpClient) metrics: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/built-in-metrics-system-net#metric-httpclientopen_connections

Also seems like this Meter stuff is only in net8.0?

jstedfast avatar Feb 11 '24 14:02 jstedfast

No. We are running .NET 6 and using Meter for producing custom application metrics. What’s new in .NET 8 is the IMeterFactory interface.

prashanth-nani avatar Feb 11 '24 15:02 prashanth-nani

A few things I'm not clear about:

  1. A Meter has a name and is used to create Counters and Histograms, which also have names. How do those names inter-relate? If I have a Meter with the name mailkit.net.smtp, does that in any way namespace the Counters and Histograms? Or would I want to name them mailkit.net.smtp.client.send.duration?
  2. Should these Meters, Counters, and Histograms be static on SmtpClient? Or should they be per-instance? If they are per-instance, how does that play out if you've got multiple client instances sending messages?

jstedfast avatar Feb 12 '24 02:02 jstedfast

I would still like answers to my questions in the previous comment, but these are the metrics that I've come up with so far (mostly focused on SmtpClient at the moment since that's what you guys are focused on):

Socket Metrics

Metric: mailkit.net.socket.connect

Name Instrument Type Unit Description
mailkit.net.socket.connect Histogram ms The number of milliseconds taken for a socket to connect to a remote host.
Attribute Type Description Examples Presence
socket.connect.result string The connection result. succeeded, failed, cancelled Always
network.peer.address string Peer IP address of the socket connection. 10.5.3.2 Always
server.address string The host name that the socket is connecting to. smtp.gmail.com Always
server.port int The port that the socket is connecting to. 465 Always
error.type string The type of exception encountered. System.Net.Sockets.SocketException If an error occurred.
socket.error string The socket error code. ConnectionRefused, TimedOut, ... If one was received.

This metric measures the time it takes to connect a socket to a remote host.

SmtpClient Metrics

Metric: mailkit.net.smtp.client.connect.counter

Name Instrument Type Unit Description
mailkit.net.smtp.client.connect.counter Counter {connection} Number of outbound SMTP connections that are currently active.
Attribute Type Description Examples Presence
network.peer.address string Peer IP address of the client connection. 10.5.3.2 When available
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connecting to. 465 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always

This metric tracks the number of successful connections made by the SmtpClient.

Metric: mailkit.net.smtp.client.connect.failures

Name Instrument Type Unit Description
mailkit.net.smtp.client.connect.failures Counter {failure} The number of times a connection failed to be established to an SMTP server.
Attribute Type Description Examples Presence
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connecting to. 465 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always
error.type string The type of exception encountered. MailKit.Net.Smtp.SmtpCommandException If an error occurred.
smtp.status_code int The SMTP status code returned by the server. 530, 550, 553, ... If one was received.
socket.error string The socket error code. ConnectionRefused, TimedOut, ... If one was received.

This metric counts the number of failed connection attempts made by the SmtpClient.

Metric: mailkit.net.smtp.client.session.duration

Name Instrument Type Unit Description
mailkit.net.smtp.client.session.duration Histogram s The duration of successfully established connections to an SMTP server.
Attribute Type Description Examples Presence
network.peer.address string Peer IP address of the client connection. 10.5.3.2 When available
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connecting to. 465 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always
error.type string The type of exception encountered. MailKit.Net.Smtp.SmtpCommandException If an error occurred.
smtp.status_code int The SMTP status code returned by the server. 530, 550, 553, ... If one was received.
socket.error string The socket error code. ConnectionRefused, TimedOut, ... If one was received.

This metric tracks the session duration of each SmtpClient connection and records whether that session ended voluntarily or due to an error.

Metric: mailkit.net.smtp.client.send.messages

Name Instrument Type Unit Description
mailkit.net.smtp.client.send.messages Counter {message} The number of messages successfully sent to an SMTP server.
Attribute Type Description Examples Presence
network.peer.address string Peer IP address of the client connection. 10.5.3.2 When available
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connecting to. 465 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always

This metric tracks the number of messages successfully sent by an SmtpClient.

Metric: mailkit.net.smtp.client.send.failures

Name Instrument Type Unit Description
mailkit.net.smtp.client.send.failures Counter {failure} The number of times that sending a message to an SMTP server failed.
Attribute Type Description Examples Presence
network.peer.address string Peer IP address of the client connection. 10.5.3.2 When available
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connecting to. 465 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always
error.type string The type of exception encountered. MailKit.Net.Smtp.SmtpCommandException If an error occurred.
smtp.status_code int The SMTP status code returned by the server. 530, 550, 553, ... If one was received.
socket.error string The socket error code. ConnectionRefused, TimedOut, ... If one was received.

This metric tracks the number of times an SmtpClient encountered an error while trying to send a message.

Metric: mailkit.net.smtp.client.send.duration

Name Instrument Type Unit Description
mailkit.net.smtp.client.send.duration Histogram ms The amount of time it takes to send a message to an SMTP server.
Attribute Type Description Examples Presence
network.peer.address string Peer IP address of the client connection. 10.5.3.2 When available
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connecting to. 465 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always
error.type string The type of exception encountered. MailKit.Net.Smtp.SmtpCommandException If an error occurred.
smtp.status_code int The SMTP status code returned by the server. 530, 550, 553, ... If one was received.
socket.error string The socket error code. ConnectionRefused, TimedOut, ... If one was received.

This metric tracks the amount of time that it takes to send a message and records any error details if the message was not sent successfully.

jstedfast avatar Feb 12 '24 15:02 jstedfast

First, I'd like to say that I'm not completely happy with the metric names I've come up with, especially the .counter and .failures metric names.

I'd also like to get some feedback on attribute names/values, especially the Type for both smtp.status_code and socket.error.

I started out with socket.error and mapped those SocketError enum values to string because the interger values are worthless to me. That said, the same is probably true for the SmtpStatusCode enum values that I decided to map to int for most of you guys.

Does it make sense for both of these to be int? In hindsight, I'm thinking it does.

The Socket Metrics are because the way I implemented it was to implement it at the MailKit.Net.SocketUtils-level rather than it the higher-level to avoid duplicating as much code. Also if a connect is aborted due to the SmtpClient.Timeout value or the CancellationToken, we don't actually get accurate information if I was to do it at the SmtpClient-level.

Thinking back on my questions from an earlier comment, I'm guessing all of my metrics should be "global" and not "instance", but I'm not 100% positive on that.

Then of course, if they are "global", then does it make sense to add attributes to determine which instance a metric was logged from? e.g. perhaps have a Guid (or int?) clientId?

jstedfast avatar Feb 12 '24 15:02 jstedfast

Ok, so reading over https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#naming-rules-for-counters-and-updowncounters

I guess I should probably make the following name changes:

  • mailkit.net.smtp.client.connect.counter -> mailkit.net.smtp.client.connect.count
  • mailkit.net.smtp.client.connect.failures -> mailkit.net.smtp.client.connect.faults ?
  • mailkit.net.smtp.client.send.messages -> mailkit.net.smtp.client.send.count
  • mailkit.net.smtp.client.send.failures -> mailkit.net.smtp.client.send.faults ?

(I think technically failures is ok, but maybe faults is better?)

I also wonder if maybe smtp.client.session.duration should just become smtp.client.duration?

jstedfast avatar Feb 12 '24 16:02 jstedfast

instead of using faults use a tag/label status=failed it will be easier to dissect the metrics using tags. if you go back to my earlier example, you'd see how I used tags to add extra data on top of the metric.

mailkit.net.client.send.count {status=[success|failure], protocol=[smtp|pop3]}
mailkit.net.client.send.rate {status=[success|failure], protocol=[smtp|pop3]} 
mailkit.net.client.send_duration {status=[success|failure], protocol=[smtp|pop3], unit=ms}

just an idea.

PS: a bit of math :) a linearly increasing counter could give you a rate using a derivative operation. a rate meter could give you a count using an integral operation. sry for the math tidbits 🤣

EugeneKrapivin avatar Feb 13 '24 07:02 EugeneKrapivin

Okay, so I'll drop the .failures.

That still doesn't answer the most important question which is should these instruments (Counters/Histograms) be per-instance or global?

jstedfast avatar Feb 13 '24 15:02 jstedfast

Just to bookmark this for myself, this is the code for metrics reporting in HttpClient:

  • https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Metrics/SocketsHttpHandlerMetrics.cs
  • https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Http/src/System/Net/Http/Metrics/MetricsHandler.cs

jstedfast avatar Feb 13 '24 15:02 jstedfast

Updated list of metrics for SmtpClient:

SmtpClient Metrics

Metric: mailkit.net.smtp.client.connect.count

Name Instrument Type Unit Description
mailkit.net.smtp.client.connect.count Counter {attempt} The number of times a client has attempted to connect to an SMTP server.
Attribute Type Description Examples Presence
network.peer.address string Peer IP address of the client connection. 142.251.167.109 When available
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connected to. 25, 465, 587 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always
error.type string The type of exception encountered. System.Net.Sockets.SocketException If an error occurred.
smtp.status_code int The SMTP status code returned by the server. 530, 550, 553, ... If one was received.
socket.error int The socket error code. 10054, 10060, 10061, ... If one was received.

This metric tracks the number of times an SmtpClient has attempted to connect to an SMTP server.

Metric: mailkit.net.smtp.client.connect.duration

Name Instrument Type Unit Description
mailkit.net.smtp.client.connect.duration Histogram ms The amount of time it takes for the client to connect to an SMTP server.
Attribute Type Description Examples Presence
network.peer.address string Peer IP address of the client connection. 142.251.167.109 When available
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connected to. 25, 465, 587 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always
error.type string The type of exception encountered. System.Net.Sockets.SocketException If an error occurred.
smtp.status_code int The SMTP status code returned by the server. 530, 550, 553, ... If one was received.
socket.error int The socket error code. 10054, 10060, 10061, ... If one was received.

This metric tracks the amount of time it takes an SmtpClient to connect to an SMTP server.

Metric: mailkit.net.smtp.client.connection.duration

Name Instrument Type Unit Description
mailkit.net.smtp.client.connection.duration Histogram s The duration of successfully established connections to an SMTP server.
Attribute Type Description Examples Presence
network.peer.address string Peer IP address of the client connection. 142.251.167.109 When available
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connected to. 25, 465, 587 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always
error.type string The type of exception encountered. MailKit.Net.Smtp.SmtpCommandException If an error occurred.
smtp.status_code int The SMTP status code returned by the server. 530, 550, 553, ... If one was received.
socket.error int The socket error code. 10054, 10060, 10061, ... If one was received.

This metric tracks the connection duration of each SmtpClient connection and records any error details if the connection was terminated involuntarily.

Metric: mailkit.net.smtp.client.send.count

Name Instrument Type Unit Description
mailkit.net.smtp.client.send.count Counter {message} The number of messages sent to an SMTP server.
Attribute Type Description Examples Presence
network.peer.address string Peer IP address of the client connection. 142.251.167.109 When available
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connected to. 25, 465, 587 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always
error.type string The type of exception encountered. MailKit.Net.Smtp.SmtpCommandException If an error occurred.
smtp.status_code int The SMTP status code returned by the server. 530, 550, 553, ... If one was received.
socket.error int The socket error code. 10054, 10060, 10061, ... If one was received.

This metric tracks the number of messages sent by an SmtpClient and records any error details if the message was not sent successfully.

Metric: mailkit.net.smtp.client.send.duration

Name Instrument Type Unit Description
mailkit.net.smtp.client.send.duration Histogram ms The amount of time it takes to send a message to an SMTP server.
Attribute Type Description Examples Presence
network.peer.address string Peer IP address of the client connection. 142.251.167.109 When available
server.address string The host name that the client is connected to. smtp.gmail.com Always
server.port int The port that the client is connected to. 25, 465, 587 Always
url.scheme string The URL scheme of the protocol used. smtp or smtps Always
error.type string The type of exception encountered. MailKit.Net.Smtp.SmtpCommandException If an error occurred.
smtp.status_code int The SMTP status code returned by the server. 530, 550, 553, ... If one was received.
socket.error int The socket error code. 10054, 10060, 10061, ... If one was received.

This metric tracks the amount of time that it takes an SmtpClient to send a message and records any error details if the message was not sent successfully.

jstedfast avatar Feb 13 '24 23:02 jstedfast

That still doesn't answer the most important question which is should these instruments (Counters/Histograms) be per-instance or global?

I'm sorry missed that question. I think you should share the Meters between instances, and it looks like the documentation says the APIs are thread-safe. I'd share the Meter (or use IMeterFactory which exists in net8 only😞), and create the instruments (counters, meters, histograms) in the constructor of the instance itself. But I'd be really happy to get some community input on this because I'm not sure whether this is the best practice.

The APIs to create instruments and record measurements are thread-safe. In .NET libraries, most instance methods require synchronization when invoked on the same object from multiple threads, but that's not needed in this case.

source: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics-instrumentation

I've seen your table, thats impressive :)

You should be very careful with tags with high cardinality: like IP and hostname. These might cause strain on the metrics database which may lead to degraded performance on the DB. I've seen (and made) mistakes where runaways tags values would cause crappy performance in influxDB, I'd assume the same could happen in Prometheus as well. source: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics-instrumentation#:~:text=Beware%20of%20having%20very%20large%20or%20unbounded%20combinations%20of%20tag%20values%20being%20recorded%20in%20practice.

EugeneKrapivin avatar Feb 14 '24 07:02 EugeneKrapivin

You should be very careful with tags with high cardinality: like IP and hostname. These might cause strain on the metrics database which may lead to degraded performance on the DB.

I saw that in the docs. I'm not sure if it's worth reporting the network.peer.address, but other than the socket.error code, I'm following the HttpClient pattern.

All that said, now that I'm looking more at ActivitySource and was also working on adding metrics for Authenticate(), I'm wondering... will this balloon into insanity? HttpClient literally only has 3 metrics:

public readonly UpDownCounter<long> OpenConnections = meter.CreateUpDownCounter<long>(
            name: "http.client.open_connections",
            unit: "{connection}",
            description: "Number of outbound HTTP connections that are currently active or idle on the client.");

        public readonly Histogram<double> ConnectionDuration = meter.CreateHistogram<double>(
            name: "http.client.connection.duration",
            unit: "s",
            description: "The duration of successfully established outbound HTTP connections.");

        public readonly Histogram<double> RequestsQueueDuration = meter.CreateHistogram<double>(
            name: "http.client.request.time_in_queue",
            unit: "s",
            description: "The amount of time requests spent on a queue waiting for an available connection.");

And then there's the issue of nastiness from #if NET6_0_OR_GREATER everywhere because I still need to support netstandard2.0/2.1 and net4x. Things are gonna get real ugly once I start adding ActivitySource.StartActivity()

jstedfast avatar Feb 14 '24 23:02 jstedfast

I could probably kill the connect.count and send.count metrics, right?

We could keep connect.duration and maybe kill the socket.error and smtp.status_code tags and just leave error.type (but maybe rename to exception.type)?

I could add authenticate.duration (something I just started adding) but not bother with authenticate.count.

SmtpClient has very little worth tracing (Send() is pretty much it other than maybe Verify() and Expand() but those are rarely used).

Pop3Client gets more complex with many more "activities" and commands that map to Pop3Client methods, so if every public API gets a metric and an activity, things will explode in code I need to write.

Then ImapClient... yikes, I'll go to my grave being unfinished with adding a metric/activity for each operation that there's a public API for.

jstedfast avatar Feb 14 '24 23:02 jstedfast

I'd highly recommend keeping the send.count. This would help us in creating dashboards for the number of messages sent.

MassTransit exports messages sent and receive counters https://masstransit.io/documentation/configuration/observability

prashanth-nani avatar Feb 15 '24 10:02 prashanth-nani

I'm just curious, but is anyone wanting tracing and metrics (and logging?) from MailKit doing anything but SMTP?

jstedfast avatar Feb 15 '24 13:02 jstedfast

I saw that in the docs. I'm not sure if it's worth reporting the network.peer.address, but other than the socket.error code, I'm following the HttpClient pattern.

I think starting with less metrics with good tags is an amazing starting point. Better than adding useless metrics and then trying to get rid of them. As a good rule of thumb I'm always looking for RED metrics because it is really easy to build meaningful alerts on top of them. Another very useful metric would probably be # active connections # of created clients, these might be an indicator of wrong library usage (maybe?).

I'm just curious, but is anyone wanting tracing and metrics (and logging?) from MailKit doing anything but SMTP?

if you assume that mainly SMTP is used, it would be a good start point and then just wait for community feedback/MRs for pop and imap. In any case, logging and traces should be part of the observability suite, so that each call could be traced to a successful/failed email sending and to find logs that caused it :)

EugeneKrapivin avatar Feb 15 '24 14:02 EugeneKrapivin

Let us take a moment to rewind a bit and take a step back and discuss what is actually needed because I think I'm perhaps overengineering this based on how little HttpClient seems to report.

Lets start with something that @EugeneKrapivin said in the beginning as well:

This is probably a huge ask, and I think it should be opened for discussion to see if it's worth the investment (aka will users of the library use it).

Part of the reason I'm asking about SMTP is because I get the distinct feeling that everyone in this discussion thread so far is entirely focused on SMTP and, honestly, that probably makes the most sense because many MailKit users are coming from the old System.Net.Mail.SmtpClient APIs.

Part of me wants to say "just wrap SmtpClient.Connect/Authenticate/Send in your own custom metrics/tracing/logging logic" because I'm not entirely sure that lower-level details are all that useful or interesting to you guys (they would be to me, but I'm not wearing the App Developer or Services Developer hat when I'm working on MailKit).

That said, if 100's of developers are going to be wrapping SmtpClient.Connect/Authenticate/Send/etc in order to collect basic metrics/tracing/logging data, then it probably really does make sense for MailKit to just add it to reduce all this code duplication.

So in that sense, I can see this "enhancement request" as being a "value-add" for MailKit and that's what interests me.

Now on to design.

If I was designing this for myself (as in, if this telemetry data was going to my own personal Azure Data Explorer), it's pretty much a guarantee that the information that I'd want is hugely different from what you guys want/need. At best, what you guys want/need will be a tiny subset of what I'd want. At worst, the data we want might be completely different.

For example, I'm not sure I'd care at all about # of messages sent. Sure, it'd be a fun statistic I could use for bragging rights, but not much else. But you guys seem to care about this metric.

Even Send/SendAsync() duration is not hugely useful for me. It's not completely useless, but on its own, it won't tell me if the time spent sending the message is due to network slowness, server response latency, the number of recipients being huge (e.g. 100's of recipients in the To/Cc/Bcc lists which each require a separate RCPT TO command), or if it's just the size of the message being 400 MB vs 4K.

These details matter (especially for my uses debugging/fixing MailKit issues).

Tracing and Logging would bring context into the picture, but again, what details do you guys care about?

Then again, presumably you guys will also already have some sort of Logging and Tracing infrastructure in your apps/services that will presumably already provide some context that would overlap whatever I add to MailKit.

So my question is: What value can MailKit provide here? Is it some details/context that your apps/services can't easily provide at a higher level? Or is it more to avoid code duplication/"boilerplate"?

I'm guessing it has more to do with reducing "boilerplate"/duplication between services than anything else.

jstedfast avatar Feb 15 '24 15:02 jstedfast