search-replace-command icon indicating copy to clipboard operation
search-replace-command copied to clipboard

Better communicate failure to search-replace JSON-serialized URLs

Open danielbachhuber opened this issue 7 years ago • 12 comments

From Post Status Slack:

javorszky [5 hours ago] Gravity forms confirmation: goes to https://livesite.com/confirmation-page
I'm on dev. Dev is https://site.test
`wp search-replace https://livesites.com https://site.text`
It does not turn gravity form's form confirmation setting
javorszky [5 hours ago] because the setting is stored in the `rg_form_meta` table in the `confirmations` column as `json`, so the url will look like
javorszky [5 hours ago] `https:\/\/livesite.com\/confirmation-page`

We should better communicate this scenario in our documentation. In fact, it may be time for a dedicated document on search-replacing URLs in the database. Or even, create a dedicated wp search-replace url command, because replacing URLs is such a common use case.

cc @javorszky

danielbachhuber avatar Nov 17 '17 18:11 danielbachhuber

Hm, would it be something that could be actually replaced by the command? I know it already handles serialised data, so (famous last words incoming), how hard would it be to replace the json one?

The good thing is that it doesn't depend on string length, so theoretically both the from and to can be escaped and replaced again. Though that would mean twice as long pass along the entire dbase.

javorszky avatar Nov 20 '17 13:11 javorszky

I know it already handles serialised data, so (famous last words incoming), how hard would it be to replace the json one?

With the current command implementation, you'd need to use regex to also search-replace slashed URLs.

With this being said, if you're only replacing the domain (which it looks like you are in your example), you could drop the https:// from both strings and solely search-replace the domain.

danielbachhuber avatar Nov 21 '17 20:11 danielbachhuber

This just occurred to me:

you could drop the https:// from both strings and solely search-replace the domain.

It will break accounts if I'm signing in via email, because without the protocol, I'm also replacing livesite.com to staging.com in here: [email protected] -> [email protected].

javorszky avatar Jan 23 '18 01:01 javorszky

however during the search-replace loop we could check whether the data is json by checking whether it successfully decoded the data and then looping over and setting a json flag, so decode -> replace -> recode -> store

javorszky avatar Jan 23 '18 01:01 javorszky

@javorszky - the user table is a good edge case, and one I think most developers forget about, it'd be great to specifically warn when affecting user data, though that might be out of scope for this issue.

Pushplaybang avatar Jan 27 '18 19:01 Pushplaybang

UPDATED 12/5/19

From my testing, to properly handle changing a url, you need more information than just the TLD. To your point @danielbachhuber, with the right regex that matches every way a url can be (within reason) it can (mostly) be handled all at once.

DOMAIN=site.com
NEW_DOMAIN=newsite.com
PREFIX=http

wp search-replace --all-tables --regex "https?(:?(%3A%2F%2F)?(\\\/|\/)*)(www\.)?$DOMAIN" "$PREFIX\$1$NEW_DOMAIN"
  • Match against http and https
  • Match every way :// can be stored in the database. This is based on the sites I've personally come across, and could be expanded. Whatever this ends up being is stored in $1 for use later
    • url encoded
    • escaped slashes
    • unescaped slashes
  • Match with or without www
  • Match the searched TLD

In the replace portion I reconstruct the url using:

  • Normalized prefix
  • The matched encoding of ://
  • The new domain

@javorszky I then run a follow up command to replace any instance of the domain which isn't an email address!

wp search-replace --all-tables --regex "(?<!@|%40)$DOMAIN" "$NEW_DOMAIN"

These two commands need to be run for each url on the website, so that means if multisite is enabled, it must be run with those urls as well:

DOMAIN=multisite.newsite.com
# first command
# second command

Of course now after reading this thread I now also have to think about user data as well! (thanks @Pushplaybang) I really welcome any feedback and hopefully this can become an easier process through some logic additions to the search-replace command.

devkinetic avatar Nov 15 '19 23:11 devkinetic

We should be mindful when talking about URL vs general text search&replace. I guess the former can be done with proper escaping (although I suspect it is fragile and may break with some character combinations), while the latter may not be always possible. For example, replacing with strings containing quotes or newlines will render JSON strings invalid.

Since the purpose of the command is to do general string S&R, I fail to see how it can be done other than identifying a JSON string, decoding it, doing S&R in the array, and encoding it back.

flaviovs avatar Jun 17 '20 00:06 flaviovs

facing same issue. following

MehbubRashid avatar May 21 '22 11:05 MehbubRashid

@devkinetic Thank you for the commands. I was running those, but on multiple search and replace commands. Will the regex option be faster than executing the commands separately?

Edit: I think this could be the answer: [--regex] Runs the search using a regular expression (without delimiters). Warning: search-replace will take about 15-20x longer when using –regex. from https://developer.wordpress.org/cli/commands/search-replace/ Thanks

planetahuevo avatar Oct 06 '22 14:10 planetahuevo

Stumbled across this and wanted to say the regex above can be very slow for large datasets (as it warns).

Over hundreds of sites I've never had a problem always using the following 4 commands to change a URL everywhere:

wp search-replace '://www.oldsite.com' '://www.newsite.com'
wp search-replace '://oldsite.com' '://newsite.com'

wp search-replace ':\/\/oldsite.com' ':\/\/newsite.com'
wp search-replace ':\/\/www.oldsite.com' ':\/\/www.newsite.com'

If using PHP, you could:

$replacements = array(
  "://oldsite.com"  => "://newsite.com",
  "://www.oldsite.com" => "://www.newsite.com",
);

foreach ( $replacements as $search => $replace ) {
  WP_CLI::runcommand( "search-replace '$search' '$replace'");
  WP_CLI::runcommand( addcslashes( "search-replace '$search' '$replace'", '/' ) );
}

I've found there is often a mix of www and non-www versions so I check for both and just automate the commands above. Can adjust slightly if using a subdomain

mrsdizzie avatar Feb 09 '23 02:02 mrsdizzie

Definitely comes down to use case. In my dealings I had to account for mixed content issues, it wasn't just moving a project from one url to another, it was normalizing as well as updating, so something comprehensive. I think that at times the detection of serialized data was also fragile in and of itself which may be separate issue entirely.

At the time, I had many WordPress projects I was inheriting in various states very quickly and didn't desire to debug every edge case that cropped up. It may be that you only need to use the regex version once and then use faster methods since you trust the database and plugins in use.

I am glad my commands helped, it shows that this is a consistent issue, and hopefully I saved some of your time. It would be so much better if WordPress just stopped using absolute links but it's just how things are I guess.

devkinetic avatar Feb 09 '23 02:02 devkinetic

Or even, create a dedicated wp search-replace url command, because replacing URLs is such a common use case.

Created https://github.com/wp-cli/search-replace-command/issues/186 for discussing a dedicated command

danielbachhuber avatar Jul 14 '23 15:07 danielbachhuber