site-kit-wp
site-kit-wp copied to clipboard
Permuting site URLs can fail for some domains and environments
Bug Description
I recently discovered a few tests that only fail locally with some versions of PHP related to testing Module::permute_site_url
.
After digging into it, I found the problem was related to parse_url
which was malforming parts of the URL when given certain UTF-8 multi-byte characters as used in some IDN domains. This goes back to an old bug in PHP and also seems to be related to the server's configured locale.
Relevant test output
There were 2 failures:
1) Google\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url with data set "http://éxämplę.test" ('http://éxämplę.test', array('http://éxämplę.test', 'https://éxämplę.test', 'http://www.éxämplę.test', 'https://www.éxämplę.test', 'http://xn--xmpl-loa2a55a.test', 'https://xn--xmpl-loa2a55a.test', 'http://www.xn--xmpl-loa2a55a.test', 'https://www.xn--xmpl-loa2a55a.test'))
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
Array (
- 0 => 'http://www.xn--xmpl-loa2a55a.test'
- 1 => 'http://www.éxämplę.test'
- 2 => 'http://xn--xmpl-loa2a55a.test'
- 3 => 'http://éxämplę.test'
- 4 => 'https://www.xn--xmpl-loa2a55a.test'
- 5 => 'https://www.éxämplę.test'
- 6 => 'https://xn--xmpl-loa2a55a.test'
- 7 => 'https://éxämplę.test'
+ 0 => 'http://www.xn--xmpl?_-bua2b.test'
+ 1 => 'http://www.éxämpl?_.test'
+ 2 => 'http://xn--xmpl?_-bua2b.test'
+ 3 => 'http://éxämpl?_.test'
+ 4 => 'https://www.xn--xmpl?_-bua2b.test'
+ 5 => 'https://www.éxämpl?_.test'
+ 6 => 'https://xn--xmpl?_-bua2b.test'
+ 7 => 'https://éxämpl?_.test'
)
/.../wordpress/wp-content/plugins/google-site-kit/vendor/wp-phpunit/wp-phpunit/includes/abstract-testcase.php:814
/.../wordpress/wp-content/plugins/google-site-kit/tests/phpunit/integration/Core/Modules/ModuleTest.php:214
/.../wordpress/wp-content/plugins/google-site-kit/vendor/phpunit/phpunit/phpunit:51
2) Google\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url with data set "http://éxämplę.test/sub-directory" ('http://éxämplę.test/sub-directory', array('http://éxämplę.test/sub-directory', 'https://éxämplę.test/sub-directory', 'http://www.éxämplę.test/sub-directory', 'https://www.éxämplę.test/sub-directory', 'http://xn--xmpl-loa2a55a.test...ectory', 'https://xn--xmpl-loa2a55a.tes...ectory', 'http://www.xn--xmpl-loa2a55a....ectory', 'https://www.xn--xmpl-loa2a55a...ectory'))
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
Array (
- 0 => 'http://www.xn--xmpl-loa2a55a.test/sub-directory'
- 1 => 'http://www.éxämplę.test/sub-directory'
- 2 => 'http://xn--xmpl-loa2a55a.test/sub-directory'
- 3 => 'http://éxämplę.test/sub-directory'
- 4 => 'https://www.xn--xmpl-loa2a55a.test/sub-directory'
- 5 => 'https://www.éxämplę.test/sub-directory'
- 6 => 'https://xn--xmpl-loa2a55a.test/sub-directory'
- 7 => 'https://éxämplę.test/sub-directory'
+ 0 => 'http://www.xn--xmpl?_-bua2b.test/sub-directory'
+ 1 => 'http://www.éxämpl?_.test/sub-directory'
+ 2 => 'http://xn--xmpl?_-bua2b.test/sub-directory'
+ 3 => 'http://éxämpl?_.test/sub-directory'
+ 4 => 'https://www.xn--xmpl?_-bua2b.test/sub-directory'
+ 5 => 'https://www.éxämpl?_.test/sub-directory'
+ 6 => 'https://xn--xmpl?_-bua2b.test/sub-directory'
+ 7 => 'https://éxämpl?_.test/sub-directory'
)
/.../wordpress/wp-content/plugins/google-site-kit/vendor/wp-phpunit/wp-phpunit/includes/abstract-testcase.php:814
/.../wordpress/wp-content/plugins/google-site-kit/tests/phpunit/integration/Core/Modules/ModuleTest.php:214
/.../wordpress/wp-content/plugins/google-site-kit/vendor/phpunit/phpunit/phpunit:51
Do not alter or remove anything below. The following sections will be managed by moderators only.
Acceptance criteria
- A new
URL
class should be created in theCore\Util
namespace with a single staticparse
method for now- The signature and behavior should match that of
(wp_)parse_url
with the main difference being that it is safe to use with multi-byte UTF-8 IDN domains - Any given URLs that do not contain problematic characters could continue to use
wp_parse_url
internally
- The signature and behavior should match that of
- All usage of
(wp_)parse_url
should be refactored to use the new utility
Implementation Brief
- Create a new class named
URL
in theincludes/Core/Util
folder having theCore\Util
namespace as per the AC. - Create a static method
parse
with the same signature aswp_parse_url()
, i.e. it should accept$url
and$component
as parameters.- Check if
$url
contains multibyte chars, i.e. ifmb_strlen($url, 'UTF-8') != strlen($url)
. - If this is not the case, simply delegate the call to
wp_parse_url()
and return its output. - If the string does contain multibyte chars, use the same logic within
wp_parse_url()
to parse the$url
. But instead of calling PHP's inbuiltparse_url()
method, create and call a function that encodes the URL and decodes the parts later. An example of this function can be found in this comment. - Find and replace the usage of
wp_parse_url()
andparse_url()
methods withURL::parse()
.
- Check if
Test Coverage
- Add phpunit tests for
URL::parse()
with similar URL's used inGoogle\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url
.
QA Brief
Changelog entry
cc: @felixarntz
@aaemnnosttv One comment on the ACs:
One exception could be URLs that are guaranteed to not be IDN
Shouldn't we remove this? I feel like it's better to stick to that convention if we introduce it. I'd argue we can't really expect a URL to not be IDN.
@felixarntz I was thinking URLs for the SK service for example but I'm also not opposed to always using the utility in all cases and letting it determine if special handling is necessary or not. In that case, it would be nice to update our PHPCS rules to require this is used instead of parse_url
or wp_parse_url
but that might be too much to do here.
@aaemnnosttv Yeah, I feel that would be better handled separately. Would you be okay removing that above note though from the ACs? I think we should aim to use the new helper throughout.
@felixarntz updated 👍
Hey @jimmymadon are you still planning on working on this? If not, please unassign yourself so someone else can pick this up - thanks!
I won't be able to pick this up this week - so will un-assign myself for now. I was not able to get very far on this issue.
IB ✔️
QA Update: ❌
@jimmymadon I have a few observations that I'd like to share:
I am being redirected to the WordPress Dashboard, in all cases the oAuth page does appear.
- When I set up Site Kit with Analytics, at the point of setting up Analytics, I am redirected back to the WordPress Dashboard. I do not get sent through the set up flow for Analytics.
- The same happens when I set up Site Kit without Analytics, but search console seems to be set up successfully.
- In addition, when I set up any other module from settings or a CTA, it starts to go through the oAuth, but I am redirected back to WP Dashboard, and in settings it is showing that the set up is incomplete.
I quickly ran through another test site on develop and could not recreate this behaviour.
The User Input actually redirected back to Site Kit dashboard but I am assuming that's because we do not go though the permission process.
Let me know if you'd like access to the site that I have been using for testing.
Just thought I'd chime in here that the related tests that currently fail on main for me pass on develop now 🎉
@wpdarren The behaviour you described is also happening on the latest stable version of SiteKit - 1.83.0 (I deactivated the tester plugin and updated SK to the latest stable version). So this does not seem to be a regression of this issue. Retried again after deactivating all the existing plugins on the test site - but the unexpected redirects still happen.
Ideally we could create a separate issue - however, it will be impossible to QA this if we don't find a fix. So I will keep this assigned to me and investigate further.
@jimmymadon Sorry, I didn't check if it was happening with the latest version - I just assumed that it was linked to this change. Like you said though it will be impossible to QA without a fix. Happy to go with your direction on this.
@wpdarren after discussing on the call today, the behavior you're seeing isn't caused by the changes here and isn't what this issue was created to fix so @jimmymadon is going to create a new issue for that specifically. That may cause this to be a bit difficult to QA from your side, but as long as there are no regressions introduced (for IDN and non-IDN domains) we can consider this one done as I verified the tests that were previously failing for me are passing now. If you'd like we can add QA:Eng to have an engineer verify the ACs are met in terms of changes in the codebase?
@wpdarren Have created a new issue #5868. Could you please check the Steps to reproduce. When I was testing this, I did have some inconsistencies in the redirects. I even managed to connect Analytics once (after a few failures). So not sure what is going on here.
@aaemnnosttv Would you like me to add ACs to this new issue?
@aaemnnosttv I have added the QA:Eng label on this ticket to complete the second part of the QAB 👍
@jimmymadon I think you've captured the steps perfectly.
QA:Eng :white_check_mark:
As stated above by @aaemnnosttv the QA:Eng
scenario is confirmed done with the tests now passing on his older PHP environment where they were previously failing.
... as long as there are no regressions introduced (for IDN and non-IDN domains) we can consider this one done as I verified the tests that were previously failing for me are passing now.
I have also taken another look at the code which all LGTM. Additionally confirmed no use of wp_parse_url
remains in the Site Kit codebase, nor parse_url
outside of the deliberate usage in the new URL
class.
@wpdarren, have sent back your way as the non-Eng QA is still marked as a :x:
QA Update: ✅
Verified:
- With site URL / domain name that has foreign UTF-8 multi-byte characters:
- Plugin setup (see note above about behaviour and new ticket created)
- User input survey works as expected.
- Entity Search box (verifying the entity URLs on the entity dashboard) works as expected
- AdSense Earnings Report works as expected.
- Creating a new Analytics and Analytics 4 property from within the plugin. See note above for plugin setup.
- Creating a TagManager container works as expected.