assemblyline icon indicating copy to clipboard operation
assemblyline copied to clipboard

Invalid (or not?) URI tags reported from multiple services

Open kam193 opened this issue 1 year ago • 4 comments

Describe the bug I observe quite a lot of warnings from service server about Invalid tag data reported from multiple services on URI tag. While I understand it's rather normal to occur occasionally, I believe some of them may be incorrectly treated as invalid, and some others may suggest improvements to the way services detect or clean tags.

Here are a couple of examples from logs:

1. Looking correct

Invalid tag data from FrankenStrings: {'network': {'static': {'uri': ['http://localhost:7777', 'http://localhost:5173', 'https://localhost:8000', 'http://localhost:3000']}}}
 from FrankenStrings: {'network': {'static': {'uri': ['http://localhost:5000/registry/api']}}}
from FrankenStrings: {'network': {'static': {'uri': ['http://0.0.0.0:8085', 'http://0.0.0.0:8082']}}}
from FrankenStrings: {'network': {'static': {'uri': ['https://medium.com/@juanc.olamendy/revolutionizing-retrieval-the-mastering-hypothetical-document-embeddings-hyde-b1fc06b9a6cc']}}}"
from FrankenStrings: {'network': {'static': {'uri': ['http://localhost:4566/foo%2Fbar/ed']}}}
from FrankenStrings: {'network': {'static': {'uri': ['http://metadata.google.internal/computeMetadata/v1/instance/attributes/', 'http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/network', 'http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email', 'http://metadata.google.internal/computeMetadata/v1/instance/zone', 'http://metadata.google.internal/computeMetadata/v1/project/project-id']}}} FrankenStrings: {'network': {'static': {'uri': ['https://artifactory.global.square/artifactory/api/pypi/block-pypi/simple']}}}"

2. Probably a suggestion for improvements

Invalid tag data from Characterize: {'network': {'static': {'uri': ['https://nls-', 'https://maps-']}}}
Invalid tag data from FrankenStrings: {'network': {'static': {'uri': ['https://obs.']}}}
from DeobfuScripter: {'network': {'static': {'uri': ['https://github.r']}}}
from FrankenStrings: {'network': {'static': {'uri': ['http://i00210gd/role/DocumentInformation', 'http://i00210gd/role/EntityInformation', 'http://i00210gd/role/OtherInformation', 'http://i00210gd/20081231']}}}
from EmlParser: {'network': {'static': {'domain': ['0.00', '0.000010', '0.04', '0.06', '0.062509', '0.08', '0.090', '0.13', '0.14', '0.17', '0.170', '0.18', '0.19', '0.25', '0.27', '0.275', '0.30', '0.32', '0.335', '0.34', [..] '98.940002', 'credentials.json', 'creds.expired', 'dd.shape', 'email.mime.base', 'email.mime.image', 'email.mime.multipart', 'email.mime.text', 'google.auth.transport.requests', [...]
FrankenStrings: {'network': {'static': {'uri': ['http://www.freemeteo.com.']}}}"
FrankenStrings: {'network': {'static': {'uri': ['http://localhost:']}}}
EmlParser: {'network': {'static': {'domain': ['2.41', 'cadical.hpp', 'ccadical.cpp', 'solver.cpp']}}}
FrankenStrings: {'network': {'static': {'uri': ['http://ocsp.digicert.com0A', 'http://ocsp.digicert.com0C', 'http://ocsp.digicert.com0X', 'http://ocsp.digicert.com0']}}}
ELFPARSER: {'network': {'static': {'ip': ['12.0.267.16']}}}
FrankenStrings: {'network': {'static': {'uri': ['http://192.168.0.xx:80', 'http://192.168.0.xx:80/v1']}}}
from ELFPARSER: {'network': {'static': {'ip': ['4.1.0.3856']}}}
from FrankenStrings: {'network': {'static': {'uri': ['http://test.ac.uk:']}}}
from FrankenStrings: {'network': {'static': {'uri': ['https://....']}}}
from FrankenStrings: {'network': {'static': {'uri': ['http://cps.root-x1.letsencrypt.org0', 'http://e1.o.lencr.org0']}}}
Invalid tag data from FrankenStrings: {'network': {'static': {'domain': ['eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6Im5qc2l6YmJlaG1tbHdzdnRreHlrIiwicm9sZSI6ImFub24iLCJpYXQiOjE2OTk2NDAzODYsImV4cCI6MjAxNTIxNjM4Nn0.MT']}}}

To Reproduce I don't have an example file for every case, but I can try to look for some.

Expected behavior Lower noise in logs. I don't know if it influences anything in the system.

It looks to me like:

  1. Validation is rejecting:
    • either URIs with port or one-word domains (like localhost) - most of them may not be interesting, but localhost could be important;
    • URIs with special chars (@)
    • domains like .internal
  2. FrankenStrings tends to:
    • include . at the end of domain, leading to empty TLD;
    • include next chars to domains when parsing certificates (e.g. http://ocsp.digicert.com0A) (
    • leaving empty port number (e.g. http://test.ac.uk:)
  3. EmlParser/ELFPARSER are very liberal, and loves to treat real numbers as domains (I believe we still cannot browse floats)

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information if pertinent):

  • Assemblyline Version: 4.4.0.89
  • All services in the newest versions

Additional context I think it may not be very important, but improving some domains' extraction can be helpful.

kam193 avatar Jan 12 '24 12:01 kam193

Hello!

With respect to the localhost, 0.0.0.0, and .internal domains, assemblyline doesn't tag network indicators that are not public facing by design. We haven't come across a specific usecase yet and are rejecting them to avoid false positives. Could you describe a usecase and expected behaviour for those tags? E.g. tagged alongside public network tags, tagged but safelisted, tagged but separately from public network tags.

The rejected @ in a path is a mismatch between our validation and the spec. It will be fixed.

The suggestions for improvements tags look like a mix of issues, thank you for reporting them.

cccs-jh avatar Jan 13 '24 00:01 cccs-jh

Hey,

From not-publicly facing domains I left just those (although 0.0.0.0 should also be ignored, my fault) as they can be eventually used in malicious actions targeting servers and cloud environments. The explanation you provided sounds good to me. I'd suggest as low priority to optionally allowing such domains, but I'm also okay with leaving it as is eventually.

kam193 avatar Jan 13 '24 10:01 kam193

@cccs-jh is this done?

cccs-kevin avatar Feb 13 '24 16:02 cccs-kevin

No. The problem with correct iocs being rejected is now fixed, but I have not finished fixing all the services that create incorrect iocs.

cccs-jh avatar Feb 14 '24 17:02 cccs-jh