aem-core-wcm-components icon indicating copy to clipboard operation
aem-core-wcm-components copied to clipboard

Brittle, Bugged, and Confusing Link Processing Logic

Open HitmanInWis opened this issue 1 year ago • 3 comments

Version: 2.25.5-SNAPSHOT

Issue 1: Logic that attempts to preserve the location of the hash in front of the query string incorrectly moves the hash if it is a substring of the path

assertEquals("https://test.com?categ=cat1#test",
    underTest.sanitize("https://test.com?categ=cat1#test", request));

org.opentest4j.AssertionFailedError: 
Expected :https://test.com?categ=cat1#test
Actual   :https://test.com#test?categ=cat1

Issue 2: url in LinkImpl is sanitized/encoded, but mappedUrl and exernalizedUrl are not. Because we return url for display on the AEM site (e.g. Button component), but mappedUrl for model.json export and JS datalayer, this results in the above bug rendering "https://test.com/#test?categ=cat1" (incorrect) on the website but putting "https://test.com?categ=cat1#test" (correct) in the datalayer and model export

Issue 3: Also, mapped/externalized URLs are not encoded, nor are they consistent with each other (map encodes path, externalize encodes nothing).

// PASS
assertEquals("https://test.com/ab%7Ccd?recipient=me%7Cyou",
    underTest.sanitize("https://test.com/ab|cd?recipient=me|you", request));

// FAIL
assertEquals("https://test.com/ab%7Ccd?recipient=me%7Cyou",
    underTest.externalize("https://test.com/ab|cd?recipient=me|you", request));

Expected :https://test.com/ab%7Ccd?recipient=me%7Cyou
Actual   :https://test.com/ab|cd?recipient=me|you  // encodes neither path nor query

// FAIL
assertEquals("https://test.com/ab%7Ccd?recipient=me%7Cyou",
    underTest.map("https://test.com/ab|cd?recipient=me|you", request));

Expected :https://test.com/ab%7Ccd?recipient=me%7Cyou
Actual   :https://test.com/ab%7Ccd?recipient=me|you // encodes path, but not query

Issue 4: There's also a bug where logic that attempts to preserve the location of the hash in front of the query string does not encode the hash

// PASS
assertEquals("https://test.com/#ab%7Cc", // converts | to %7C
    underTest.sanitize("https://test.com/#ab|c", request));

// FAIL
assertEquals("https://test.com/#ab%7Cc?xyz",
    underTest.sanitize("https://test.com/#ab|c?xyz", request));

Expected :https://test.com/#ab%7Cc?xyz
Actual   :https://test.com/#ab|c?xyz  // fails to convert | to %7C

Issue 5: All in all the code is very confusing as well - For URL https://test.com/#/downloads/file.html?name=/content/file.zip LinkUtil.escape is being called with:

  • Path: https://test.com/#/downloads/file.html
  • Query: name=/content/file.zip
  • Fragment: /downloads/file.html Really the Path should be https://test.com/, but I believe it's being passed with the hash in it for the logic that is trying to preserve the hash in front of the query. This also begs a question of why we're supporting query after hash anyway - in the above URL, by HTTP convention isn't the hash actually #/downloads/file.html?name=/content/file.zip with no query string? Or if we want to assume this was meant to be a hash and query, and we're trying to be nice to authors by "fixing" it for them, then we should fix in the output URL.

To summarize, solutions for Issue 5 are one of the following:

  • Treat query after hash as part of the hash - there should be no query string, and the entire hash (with the query part) should be encoded via the hash encoding rules in code
  • Treat query after hash as an accident, leaving the query as a true query string, and correct the order of query/hash in the output.

HitmanInWis avatar Jun 21 '24 13:06 HitmanInWis

Issue 6: Asset link test at com.adobe.cq.wcm.core.components.internal.link.LinkBuilderImplTest#testLinkToAsset is not running, b/c it is missing the @Test annotation.

If you add @Test to run it, it breaks with the following exception:

java.lang.IllegalArgumentException: width or height <= 0

	at com.day.image.Layer.init(Layer.java:3181)
	at com.day.image.Layer.<init>(Layer.java:501)
	at io.wcm.testing.mock.aem.builder.ContentBuilder.createDummyRasterImage(ContentBuilder.java:405)
	at io.wcm.testing.mock.aem.builder.ContentBuilder.createDummyImage(ContentBuilder.java:398)
	at io.wcm.testing.mock.aem.builder.ContentBuilder.asset(ContentBuilder.java:308)
	at io.wcm.testing.mock.aem.builder.ContentBuilder.asset(ContentBuilder.java:295)
	at com.adobe.cq.wcm.core.components.internal.link.LinkBuilderImplTest.testLinkToAsset(LinkBuilderImplTest.java:111)

I believe it can be fixed by simply changing

Asset asset = context.create().asset("/content/dam/asset.jpg", 0, 0, "image/jpeg");

to

Asset asset = context.create().asset("/content/dam/asset.jpg", 1, 1, "image/jpeg");

HitmanInWis avatar Jun 21 '24 16:06 HitmanInWis

Issue 7: Hash is duplicated when it is in front of a query string and there is no URL path

// PASS
assertEquals("https://test.com?categ=cat1#top",
        underTest.sanitize("https://test.com?categ=cat1#top", request));

assertEquals("https://test.com#top?categ=cat1",
        underTest.sanitize("https://test.com#top?categ=cat1", request));

// FAIL
Expected :https://test.com#top?categ=cat1
Actual   :https://test.com#top?categ=cat1#top // duplicate hash

HitmanInWis avatar Jun 21 '24 21:06 HitmanInWis

Issue 8: (minor issue) A fallback LinkImpl is always constructed in LinkBuilderImpl::buildLink b/c call to orElse()

            return pathProcessors.stream()
                    .filter(pathProcessor -> pathProcessor.accepts(decodedPath, request))
                    .findFirst()
                    .map(pathProcessor -> new LinkImpl(
                            pathProcessor.sanitize(decodedPath, request),
                            LinkManagerImpl.isExternalLink(decodedPath) ? decodedPath : pathProcessor.map(decodedPath, request),
                            pathProcessor.externalize(decodedPath, request),
                            this.reference,
                            pathProcessor.processHtmlAttributes(decodedPath, htmlAttributes)))
                    .orElse(new LinkImpl(path, path, path, this.reference, htmlAttributes));   // this is always constructed, though rarely needed

HitmanInWis avatar Jun 24 '24 15:06 HitmanInWis