Steeltoe Address S3994/S3995/S3996/S3997/S4005: Use Uri instead of string

Address existing violations of the following, related rules:

in the codebase and set severity to Warning in Steeltoe.Debug.ruleset and Steeltoe.Release.ruleset.

To find existing violations, enable the rule (see above) and rebuild src/Steeltoe.All.sln to make them appear in the Output window.

To address the violations, choose from the following on a case-by-case basis:

Fix the violation by changing the code to not violate the rule
- When the method calls into a .NET method that has an overload for Uri, call that instead and update the method signature to take an Uri instead of a string. Consider adding an overload that takes a string, which calls this method
Suppress the violation in code using #pragma warning disable/restore, preceded by a justification comment if not obvious

Note: This issue mitigates security risks, though it requires more thorough investigation on the best path forward. Should we enable (some of) the rules or are there too many false positives, leading to numerous suppressions?

Jul 28 '22 13:07 bart-vmware

As of 5d792a85e72728c3ecfd191ba9c32e3e0bf6843c I'm seeing an aggregate of 368 warnings, with 231 of them being in test projects. Is it worth addressing these rules for test projects? Exactly half of all violations are in Steeltoe.Management.Endpoint.Test

Here's the counts by rule: S3994: 58 S3995: 16 S3996: 61 S3997: 0 S4005: 233

Aug 18 '22 21:08 TimHess

Thanks for providing the stats. I consider them a rough indication, let's not forget that these rules propagate. For example, when adding an overload that takes an Uri parameter in low-level code, new violations of other rules kick in.

To answer your question: All rules except S4005 concern sanitizing inputs, so they are relevant for library code. S4005 applies to the call-site, so is relevant for tests as well.

I inspected a few violations in Steeltoe.Management.Endpoint.Test, for example:

HttpResponseMessage response = await client.GetAsync("/actuator/metrics");

This is easy to fix:

HttpResponseMessage response = await client.GetAsync(new Uri("/actuator/metrics"));

However in Steeltoe.Messaging.RabbitMQ.Test, I'm seeing multiple cases where the same URL is reused, for example:

HttpResponseMessage result = await client.GetAsync($"http://localhost:15672/api/queues/%3F/{queue.QueueName}");
int n = 0;

while (n++ < 100 && result.StatusCode == HttpStatusCode.NotFound)
{
    await Task.Delay(100);
    result = await client.GetAsync($"http://localhost:15672/api/queues/%2F/{queue.QueueName}");
}

The queue name does not change between invocations, so it improves readability to extract both URLs into a single Uri variable that is initialized upfront. Currently it takes a bit of effort to spot that both URLs are the same. Extracting into a variable provides the opportunity to give it a meaningful name, which improves readability further.

Aside from the security concern, taking Uri parameters instead of strings makes input validation a lot easier for Steeltoe library code. We'd only need to check basic things like the protocol/scheme, whether it is absolute/relative etc. This ensures "fail-fast" which makes tracing down the source of an error easier for us and Steeltoe consumers. The same applies to our tests: I'd rather see it fail in new Uri() than at a call stack somewhere deep into Steeltoe code, RabbitMQ code or somewhere else. This helps in adhering to the principle that test failures should be easy to understand (without needing to fire up a debugger or inspect the source code of the test).

When the number of violations of the other rules in tests is small, it probably doesn't hurt to make them compliant instead of disabling the rule for test projects. Before I joined the team, rules configuration had become so complex that nobody realized how flawed it was. Duplicate suppressions, and worse: unintended suppressions. So I'd like to keep the configuration as simple as possible, not going down the path again to suppress rules at various levels unless there's no good other way. We have one exception today, and that's for requiring XML comments. That doesn't make any sense in test projects, while the number of violations is massive.

There are several things to consider when reviewing violations. There may be more, I haven't seen them all, which is why it needs some more investigation. So far, I've seen the following cases:

A Steeltoe API accepts a string and passes it along to an underlying .NET or third-party API, which also provides an Uri overload. The fix is to change the Steeltoe API signature from string to Uri, optionally adding a string-based overload that calls into the Uri overload.
A Steeltoe API accepts a string and passes it to an underlying .NET or third-party API that can only take a string, and the URL is opaque to the Steeltoe API. The URL value is irrelevant to Steeltoe code and we only have a vague idea of what the URL really means and what validations are involved. We know that these validations may change over time, but we don't care because the URL value is irrelevant to Steeltoe code (aside from a basic null check). Steeltoe just passes it through unmodified. Although this scenario is a design smell, it would be a case where the rule doesn't apply.
A Steeltoe API accepts a string and passes it to an underlying .NET or third-party API that can only take a string, and the URL value itself has meaning within Steeltoe code. For example, it extracts a query string parameter, requires HTTPS to be used, or needs to know whether the URL is absolute or relative. The right approach here would be to rely on .NET parsing of the string instead of the insecure practice of scanning the string ourselves. So given there's a need to parse the Uri, the better option would be to change the signature from string to Uri, so that callers are responsible for building a valid Uri.

Aug 19 '22 08:08 bart-vmware