dokuwiki-plugin-move icon indicating copy to clipboard operation
dokuwiki-plugin-move copied to clipboard

Handle config option "useslash" when renaming pages

Open Saruspete opened this issue 6 years ago • 7 comments

Hello,

Upon renaming, it seems the rewriting of existing links does not handle the config "useslash" properly. I can see only one usage of this parameter in https://github.com/michitux/dokuwiki-plugin-move/blob/master/helper/handler.php#L142 but it does not seems to be working correctly.

Instead of using a fixed ":" in the code, it would be better to assign this separator to a variable / constant and use it instead.

Thanks for the work :)

Saruspete avatar Nov 22 '18 00:11 Saruspete

So you are suggesting when the useslash is on, all adapted links should use / as separator instead of :? Or are you saying that links with / as namespace separator are not or not properly changed? I'm asking because I personally prefer to have : in the wiki text even though I also like to have / in URLs.

michitux avatar Nov 23 '18 21:11 michitux

Ideally, it would be the same as the one it was written as before. In my case, The namespace change and rewrite worked, but the new links were using :, while I used / .

You could use some regex to match the pattern used like:

$s = "This is text [[hello:world:omgwtfbbq/hihi/hoho:mouarf|and a link]] for this simple example [[hello:world:omgwtfbbq:custom]] and another type [[hello/world/omgwtfbbq/index|of link]] yay";
preg_match_all('#\[\[(\w+([:/])?)(?1)*(?:\|[^]]+?)?\]\]#', $s, $m, PREG_SET_ORDER);
print_r($m);
/*
Array
(
    [0] => Array
        (
            [0] => [[hello:world:omgwtfbbq/hihi/hoho:mouarf|and a link]]
            [1] => hello:
            [2] => :
        )

    [1] => Array
        (
            [0] => [[hello:world:omgwtfbbq:custom]]
            [1] => hello:
            [2] => :
        )

    [2] => Array
        (
            [0] => [[hello/world/omgwtfbbq/index|of link]]
            [1] => hello/
            [2] => /
        )

)
*/

But you can also just use a preg_replace to do all the job (which is simpler, imho). In a simple but dumb version:

$search = 'world([:/])omgwtfbbq';
$replace = 'moto\1rola\1revival';
echo preg_replace($search, $replace, $s);
// This is text [[hello:moto:rola:revival/hihi/hoho:mouarf|and a link]] for this simple example [[hello:moto:rola:revival:custom]] and another type [[hello/moto/rola/revival/index|of link]] yay

Or a more complex & valid based on the regex from preg_match_all :

$search = 'world([:/])omgwtfbbq';
$replace = 'moto\3rola\3revival';
echo preg_replace('#(\[\[(\w*[:/]?)(?2)*)'.$search.'((?2)*(:?\|[^]]+?)?\]\])#',   '\1'.$replace.'\4', $s);
// This is text [[hello:moto:rola:revival/hihi/hoho:mouarf|and a link]] for this simple example [[hello:moto:rola:revival:custom]] and another type [[hello/moto/rola/revival/index|of link]] yay

$search = 'hello([:/])world\3omgwtfbbq';
$replace = 'basic';
echo preg_replace('#(\[\[(\w*[:/]?)(?2)*)'.$search.'((?2)*(:?\|[^]]+?)?\]\])#',   '\1'.$replace.'\4', $s);
// This is text [[basic/hihi/hoho:mouarf|and a link]] for this simple example [[basic:custom]] and another type [[basic/index|of link]] yay

$search = 'hello()';
$replace = 'ohai';
echo preg_replace('#(\[\[(\w*[:/]?)(?2)*)'.$search.'((?2)*(:?\|[^]]+?)?\]\])#',   '\1'.$replace.'\4', $s);
// This is text [[ohai:world:omgwtfbbq/hihi/hoho:mouarf|and a link]] for this simple example [[ohai:world:omgwtfbbq:custom]] and another type [[ohai/world/omgwtfbbq/index|of link]] yay

So you just have to construct the $search and $replace vars, the rules being:

  • do not change the number of capturing parenthesis in $search: always 1 and only 1.
  • if need to match more than 1 separator in $search (eg, 3 namespaces or more), use \3 to refer to the first separator
  • use \3 as separator in $replace

Here are more details on the regex: '#(\[\[(\w*[:/]?)(?2)*)'.$search.'((?2)*(:?\|[^]]+?)?\]\])#

  • '#( : # will be the regex escape char, and we start capturing group1.
  • \[\[ : links starts by [[. As it's a regex special char, we escape it
  • (\w*[:/]?): capturing group2, with any printable letter (no space in url ?) followed by a separator : or /, zero or 1 time, end of group2
  • (?2)*: repeat group2, any number of time (0 to infinity)
  • )'.$search.'( : end of group1, the search value (which must contain a capturing group3 that will be our selected separator), start of group4
  • ((?2)*: repeat group2, any number of time.
  • (:?\|[^]]+?)?: (:? will start a non-capturing groupZ, catching a litteral | followed by any character that is not a bracket [^]] any number of time, but non greedy +?, end of groupZ ) and accept this groupZ zero or one time ?
  • \]\])# : ]] for end of link + end regex

Hope this helps.

Saruspete avatar Nov 25 '18 01:11 Saruspete

@saruspete thank you for giving us an regexp tutorial.

@michitux I'd vote for won't implement. I agree with you that useslash is mostly cosmetics for the URL. The canonical way to write a pageid is using colons.

splitbrain avatar Nov 25 '18 08:11 splitbrain

@splitbrain so you'd rather provide a feature to end user (useslash) and limits its use and go against end-user choice, for the sole sake of cosmetics ?

(If you take this code as a regex tutorial, you may come play with us at regexcrossword.com )

Saruspete avatar Nov 25 '18 19:11 Saruspete

@Saruspete Concerning the regex: this is not how this plugin works, otherwise it would cause a ton of issues with link syntax in code blocks etc. This plugin uses DokuWiki's parser which builds together all the different syntaxes into a big and complex regex. This is why the plugin is able to pick up your links even though / is not handled explicitly.

I would consider two simple options for implementing this:

  • If useslash is enabled and there is a / in the link, in the end all : in the link are converted to /.
  • Provide a new configuration option that triggers this conversion.

It is not feasible to convert a mixture of / and : back into a mixture as this can become quite complex (e.g. imagine a middle part of the link needs to be removed). Note that all internal DokuWiki functions only use : and convert / to : as a first step, that's why there is almost no code for that in the plugin and just keeping them is no option.

Both of these options have their disadvantages, the first cannot handle the conversion of a simple link without namespaces, the second option introduces another configuration option and I would prefer to avoid that additional complexity for the user. Nevertheless I wouldn't say no if somebody provided an implementation including some unit tests.

michitux avatar Nov 25 '18 20:11 michitux

@Saruspete Note also that even if the regex was only applied to links themselves, what you propose is unable to handle relative links or links that are not valid page ids but become a matching id once they have been cleaned (like Foo Bar which becomes foo_bar).

michitux avatar Nov 25 '18 20:11 michitux

@michitux Indeed, makes sense. I was missing all these internals. Thanks a lot for the explanation !

When you say configuration option, is it for the whole wiki, in the configuration panel ? Or just for a default value for a checkbox on the move form (to be displayed only if useslash is enabled) ?

If it's too many changes for such a "one-shot" feature, I can live with a sed on the data files.

Saruspete avatar Nov 25 '18 20:11 Saruspete