site-kit-wp icon indicating copy to clipboard operation
site-kit-wp copied to clipboard

Permuting site URLs can fail for some domains and environments

Open aaemnnosttv opened this issue 3 years ago • 8 comments

Bug Description

I recently discovered a few tests that only fail locally with some versions of PHP related to testing Module::permute_site_url.

After digging into it, I found the problem was related to parse_url which was malforming parts of the URL when given certain UTF-8 multi-byte characters as used in some IDN domains. This goes back to an old bug in PHP and also seems to be related to the server's configured locale.

Relevant test output
There were 2 failures:

1) Google\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url with data set "http://éxämplę.test" ('http://éxämplę.test', array('http://éxämplę.test', 'https://éxämplę.test', 'http://www.éxämplę.test', 'https://www.éxämplę.test', 'http://xn--xmpl-loa2a55a.test', 'https://xn--xmpl-loa2a55a.test', 'http://www.xn--xmpl-loa2a55a.test', 'https://www.xn--xmpl-loa2a55a.test'))
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    0 => 'http://www.xn--xmpl-loa2a55a.test'
-    1 => 'http://www.éxämplę.test'
-    2 => 'http://xn--xmpl-loa2a55a.test'
-    3 => 'http://éxämplę.test'
-    4 => 'https://www.xn--xmpl-loa2a55a.test'
-    5 => 'https://www.éxämplę.test'
-    6 => 'https://xn--xmpl-loa2a55a.test'
-    7 => 'https://éxämplę.test'
+    0 => 'http://www.xn--xmpl?_-bua2b.test'
+    1 => 'http://www.éxämpl?_.test'
+    2 => 'http://xn--xmpl?_-bua2b.test'
+    3 => 'http://éxämpl?_.test'
+    4 => 'https://www.xn--xmpl?_-bua2b.test'
+    5 => 'https://www.éxämpl?_.test'
+    6 => 'https://xn--xmpl?_-bua2b.test'
+    7 => 'https://éxämpl?_.test'
 )

/.../wordpress/wp-content/plugins/google-site-kit/vendor/wp-phpunit/wp-phpunit/includes/abstract-testcase.php:814
/.../wordpress/wp-content/plugins/google-site-kit/tests/phpunit/integration/Core/Modules/ModuleTest.php:214
/.../wordpress/wp-content/plugins/google-site-kit/vendor/phpunit/phpunit/phpunit:51

2) Google\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url with data set "http://éxämplę.test/sub-directory" ('http://éxämplę.test/sub-directory', array('http://éxämplę.test/sub-directory', 'https://éxämplę.test/sub-directory', 'http://www.éxämplę.test/sub-directory', 'https://www.éxämplę.test/sub-directory', 'http://xn--xmpl-loa2a55a.test...ectory', 'https://xn--xmpl-loa2a55a.tes...ectory', 'http://www.xn--xmpl-loa2a55a....ectory', 'https://www.xn--xmpl-loa2a55a...ectory'))
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    0 => 'http://www.xn--xmpl-loa2a55a.test/sub-directory'
-    1 => 'http://www.éxämplę.test/sub-directory'
-    2 => 'http://xn--xmpl-loa2a55a.test/sub-directory'
-    3 => 'http://éxämplę.test/sub-directory'
-    4 => 'https://www.xn--xmpl-loa2a55a.test/sub-directory'
-    5 => 'https://www.éxämplę.test/sub-directory'
-    6 => 'https://xn--xmpl-loa2a55a.test/sub-directory'
-    7 => 'https://éxämplę.test/sub-directory'
+    0 => 'http://www.xn--xmpl?_-bua2b.test/sub-directory'
+    1 => 'http://www.éxämpl?_.test/sub-directory'
+    2 => 'http://xn--xmpl?_-bua2b.test/sub-directory'
+    3 => 'http://éxämpl?_.test/sub-directory'
+    4 => 'https://www.xn--xmpl?_-bua2b.test/sub-directory'
+    5 => 'https://www.éxämpl?_.test/sub-directory'
+    6 => 'https://xn--xmpl?_-bua2b.test/sub-directory'
+    7 => 'https://éxämpl?_.test/sub-directory'
 )

/.../wordpress/wp-content/plugins/google-site-kit/vendor/wp-phpunit/wp-phpunit/includes/abstract-testcase.php:814
/.../wordpress/wp-content/plugins/google-site-kit/tests/phpunit/integration/Core/Modules/ModuleTest.php:214
/.../wordpress/wp-content/plugins/google-site-kit/vendor/phpunit/phpunit/phpunit:51


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • A new URL class should be created in the Core\Util namespace with a single static parse method for now
    • The signature and behavior should match that of (wp_)parse_url with the main difference being that it is safe to use with multi-byte UTF-8 IDN domains
    • Any given URLs that do not contain problematic characters could continue to use wp_parse_url internally
  • All usage of (wp_)parse_url should be refactored to use the new utility

Implementation Brief

  • Create a new class named URL in the includes/Core/Util folder having the Core\Util namespace as per the AC.
  • Create a static method parse with the same signature as wp_parse_url(), i.e. it should accept $url and $component as parameters.
    • Check if $url contains multibyte chars, i.e. if mb_strlen($url, 'UTF-8') != strlen($url).
    • If this is not the case, simply delegate the call to wp_parse_url() and return its output.
    • If the string does contain multibyte chars, use the same logic within wp_parse_url() to parse the $url. But instead of calling PHP's inbuilt parse_url() method, create and call a function that encodes the URL and decodes the parts later. An example of this function can be found in this comment.
    • Find and replace the usage of wp_parse_url() and parse_url() methods with URL::parse().

Test Coverage

  • Add phpunit tests for URL::parse() with similar URL's used in Google\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url.

QA Brief

Changelog entry

aaemnnosttv avatar Feb 02 '22 19:02 aaemnnosttv

cc: @felixarntz

aaemnnosttv avatar Feb 03 '22 09:02 aaemnnosttv

@aaemnnosttv One comment on the ACs:

One exception could be URLs that are guaranteed to not be IDN

Shouldn't we remove this? I feel like it's better to stick to that convention if we introduce it. I'd argue we can't really expect a URL to not be IDN.

felixarntz avatar Feb 03 '22 15:02 felixarntz

@felixarntz I was thinking URLs for the SK service for example but I'm also not opposed to always using the utility in all cases and letting it determine if special handling is necessary or not. In that case, it would be nice to update our PHPCS rules to require this is used instead of parse_url or wp_parse_url but that might be too much to do here.

aaemnnosttv avatar Feb 03 '22 17:02 aaemnnosttv

@aaemnnosttv Yeah, I feel that would be better handled separately. Would you be okay removing that above note though from the ACs? I think we should aim to use the new helper throughout.

felixarntz avatar Feb 03 '22 18:02 felixarntz

@felixarntz updated 👍

aaemnnosttv avatar Feb 03 '22 18:02 aaemnnosttv

Hey @jimmymadon are you still planning on working on this? If not, please unassign yourself so someone else can pick this up - thanks!

FlicHollis avatar May 26 '22 08:05 FlicHollis

I won't be able to pick this up this week - so will un-assign myself for now. I was not able to get very far on this issue.

jimmymadon avatar May 26 '22 10:05 jimmymadon

IB ✔️

eugene-manuilov avatar Jun 11 '22 15:06 eugene-manuilov

QA Update: ❌

@jimmymadon I have a few observations that I'd like to share:

I am being redirected to the WordPress Dashboard, in all cases the oAuth page does appear.

  • When I set up Site Kit with Analytics, at the point of setting up Analytics, I am redirected back to the WordPress Dashboard. I do not get sent through the set up flow for Analytics.
  • The same happens when I set up Site Kit without Analytics, but search console seems to be set up successfully.
  • In addition, when I set up any other module from settings or a CTA, it starts to go through the oAuth, but I am redirected back to WP Dashboard, and in settings it is showing that the set up is incomplete.

I quickly ran through another test site on develop and could not recreate this behaviour.

The User Input actually redirected back to Site Kit dashboard but I am assuming that's because we do not go though the permission process.

Let me know if you'd like access to the site that I have been using for testing.

wpdarren avatar Sep 13 '22 14:09 wpdarren

Just thought I'd chime in here that the related tests that currently fail on main for me pass on develop now 🎉

aaemnnosttv avatar Sep 13 '22 16:09 aaemnnosttv

@wpdarren The behaviour you described is also happening on the latest stable version of SiteKit - 1.83.0 (I deactivated the tester plugin and updated SK to the latest stable version). So this does not seem to be a regression of this issue. Retried again after deactivating all the existing plugins on the test site - but the unexpected redirects still happen.

Ideally we could create a separate issue - however, it will be impossible to QA this if we don't find a fix. So I will keep this assigned to me and investigate further.

jimmymadon avatar Sep 17 '22 23:09 jimmymadon

@jimmymadon Sorry, I didn't check if it was happening with the latest version - I just assumed that it was linked to this change. Like you said though it will be impossible to QA without a fix. Happy to go with your direction on this.

wpdarren avatar Sep 19 '22 10:09 wpdarren

@wpdarren after discussing on the call today, the behavior you're seeing isn't caused by the changes here and isn't what this issue was created to fix so @jimmymadon is going to create a new issue for that specifically. That may cause this to be a bit difficult to QA from your side, but as long as there are no regressions introduced (for IDN and non-IDN domains) we can consider this one done as I verified the tests that were previously failing for me are passing now. If you'd like we can add QA:Eng to have an engineer verify the ACs are met in terms of changes in the codebase?

aaemnnosttv avatar Sep 19 '22 15:09 aaemnnosttv

@wpdarren Have created a new issue #5868. Could you please check the Steps to reproduce. When I was testing this, I did have some inconsistencies in the redirects. I even managed to connect Analytics once (after a few failures). So not sure what is going on here.

@aaemnnosttv Would you like me to add ACs to this new issue?

jimmymadon avatar Sep 19 '22 15:09 jimmymadon

@aaemnnosttv I have added the QA:Eng label on this ticket to complete the second part of the QAB 👍

@jimmymadon I think you've captured the steps perfectly.

wpdarren avatar Sep 20 '22 09:09 wpdarren

QA:Eng :white_check_mark:

As stated above by @aaemnnosttv the QA:Eng scenario is confirmed done with the tests now passing on his older PHP environment where they were previously failing.

... as long as there are no regressions introduced (for IDN and non-IDN domains) we can consider this one done as I verified the tests that were previously failing for me are passing now.

I have also taken another look at the code which all LGTM. Additionally confirmed no use of wp_parse_url remains in the Site Kit codebase, nor parse_url outside of the deliberate usage in the new URL class.

@wpdarren, have sent back your way as the non-Eng QA is still marked as a :x:

techanvil avatar Sep 21 '22 16:09 techanvil

QA Update: ✅

Verified:

  • With site URL / domain name that has foreign UTF-8 multi-byte characters:
    • Plugin setup (see note above about behaviour and new ticket created)
    • User input survey works as expected.
    • Entity Search box (verifying the entity URLs on the entity dashboard) works as expected
    • AdSense Earnings Report works as expected.
    • Creating a new Analytics and Analytics 4 property from within the plugin. See note above for plugin setup.
    • Creating a TagManager container works as expected.

wpdarren avatar Sep 22 '22 05:09 wpdarren