opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

[pkg/ottl] OTTL MD5 converter

Open edmocosta opened this issue 1 year ago • 1 comments
trafficstars

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

Although MD5 is no longer a recommended/safe hashing algorithm, it is still required for compatibility purposes. OTTL currently does not support it.

Describe the solution you'd like

A new OTTL converter/function MD5(attributes), which when invoked would transform the underlying string data into a MD5 hash value.

Describe alternatives you've considered

No response

Additional context

No response

edmocosta avatar Jun 27 '24 16:06 edmocosta

Pinging code owners:

  • pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Jun 27 '24 16:06 github-actions[bot]

Hey @edmocosta,

it is still required for compatibility purposes

Do you mind explaining what specifically you need to support for compatibility purposes?

My preference is to keep this as unsupported/deprecated and use more secure hashing algorithms.

MovieStoreGuy avatar Jun 30 '24 08:06 MovieStoreGuy

Hi @MovieStoreGuy!

I completely agree with you regarding using a more secure/non-colliding hashing algorithm, especially for encryption. But MD5 was widely used for fingerprinting/checksum files for example, and forcing users to change their systems to use a more secure/non-colliding hashing would requires re-hashing and/or performing some kind of migration, which depending on the case, might not be feasible in the short term.

IMO, adding the MD5 function has similar purposes as adding SHA1, and although both are considered unsafe (being MD5 unsafer), they're still in use - unfortunately.

edmocosta avatar Jul 01 '24 08:07 edmocosta

@edmocosta Thanks for the detailed explanation. I think the one difference between MD5 and SHA1 here is that SHA1 hashes are supported by the attributes processor, and the transform processor needs to achieve 1:1 functionality before we will replace it. That said, I think we should at least consider this to boost compatibility with existing systems.

I do have a few questions of my own:

  1. What's the attack vector for hash collisions that could result from using MD5 hashes? Is it risky enough that we should avoid this, or remote enough that we can carefully continue?
  2. If we include this, what's the deprecation timeline?
  3. Would we feel better about this if we put it behind a permanent feature gate that forces the user to acknowledge the risks?

@MovieStoreGuy @TylerHelmuth what do you think?

evan-bradley avatar Jul 01 '24 14:07 evan-bradley

It is cumbersome, but there is also an option to fork the transformprocessor and add the desired Converter for your own use.

TylerHelmuth avatar Jul 01 '24 18:07 TylerHelmuth

An MD5 converter was identified here as a missing feature https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31930

flexitrev avatar Jul 02 '24 13:07 flexitrev

Hi @evan-bradley!

Thanks for chime in! Here's my two cents:

I think the one difference between MD5 and SHA1 here is that SHA1 hashes are supported by the attributes processor, and the transform processor needs to achieve 1:1 functionality before we will replace it.

That's exactly what I meant for compatibility purpose. Although SHA1 is also not recommended, it was used while its usages are not replaced by another hashing algorithm.

What's the attack vector for hash collisions that could result from using MD5 hashes? Is it risky enough that we should avoid this, or remote enough that we can carefully continue?

I don't see how it could be used to attack the collector itself. We would only provide a function to allow users to hash a string into a MD5 digest, if they decided to use it for cryptography purposes, that would be a risk, but IMHO, I do believe it should be up to them to weight and assume that risk. Other tools and databases on the market does support the MD5 function, and are not considered insecure just because that.

If we include this, what's the deprecation timeline?

IMO, that's pretty hard to tell, MD5 has long been considered unsafe, but it's still out there and being supported by several tools/databases etc.

Finally, I'm not standing for MD5 or encouraging its usage, but I do think that supporting it might be useful for users that still have to maintain legacy softwares with MD5 hashes.

edmocosta avatar Jul 02 '24 13:07 edmocosta

Hey folks, any thoughts on this? It seems it was identified as missing feature here.

edmocosta avatar Jul 26 '24 08:07 edmocosta

It's definitely the case that MD5 should not be used for anything secure. But it's available everywhere, from the command line to essentially every programming language, and if what you're doing is looking for a "did this string change" kind of validation, it's a really useful tool. We should support it without stressing about it.

kentquirk avatar Jul 26 '24 13:07 kentquirk

I agree that we should include it. Thanks for digging into this @edmocosta.

evan-bradley avatar Jul 26 '24 14:07 evan-bradley

Thanks folks for considering this feature! I did submit a PR a time ago, if everyone agrees on that, I think we could re-open it!

edmocosta avatar Jul 26 '24 14:07 edmocosta

Sounds good, I've reopened the PR. Could you make any necessary updates and we can start the review process?

evan-bradley avatar Jul 26 '24 14:07 evan-bradley

Sounds good, I've reopened the PR. Could you make any necessary updates and we can start the review process?

It seems the PR code is still valid, I've updated the branch. Thanks!

edmocosta avatar Jul 26 '24 15:07 edmocosta