dokuwiki-plugin-dw2pdf
dokuwiki-plugin-dw2pdf copied to clipboard
Remove all embedded links except those from auto generated contents page
We've got a wiki based documentation system that we use internally as a wiki but we also group pages together to create one long page that we can then export as a PDF.
The problem is that all the internal links appear in the PDF with the name of our local server. The local server is not available to the people that we send the PDF to so the links are misleading and annoying those that use the PDF.
I've seen in another post about adding ?nolink (or &nolink) to the end of image definitions to stop them appearing in the PDF but then they won't link in our online version either.
Is there a way to just get DW2PDF to strip all internal links before creating the PDF so that the document only has links in for:
- just links from the contents page to other pages in the PDF
- links above and just links to public web pages (ie only stripping internal wiki links or links starting with a particular domain pattern)
OK so I have a partial solution:
I added to dw2pdf/conf/default.php
$conf['excludelinksto'] = '';
I added to dw2pdf/conf/metadata.php
$meta['excludelinksto'] = array('string');
added a line to action.php at line 490 after
// loop over all pages
$counter = 0;
$no_pages = count($this->list);
to split the domains passed through the config :
//added to pull array of domains from config CJ 20180214 1631
$excludelinksto = explode(',', $this->getConf('excludelinksto'));
and after
if($pagehtml == '') {
continue;
}
$pagehtml .= $this->page_depend_replacements($template['cite'], $page);
I added :
//added to strip all links matching domains listed in config from pagehtml CJ 20180214 1456
foreach ((array)$excludelinksto as $excludelink){
//escape any / characters in the domains
if ($excludelink !=""){
$excludelink = preg_replace ("/","\/",$excludelink);
$pattern="/(<a [^>]*".$excludelink.".*?>)(.*?)<\/a>/";
$pagehtml = preg_replace($pattern, "$2", $pagehtml);
}
}
and it now strips links when creating a PDF. However its stripping all when it should only strip some...
OK, it now works : Replaced the reg exp to escape / with a trim and used # as the reg exp delimiters instead of /
//added to strip all links matching domains listed in config from pagehtml CJ 20180214 1456
foreach ((array)$excludelinksto as $excludelink){
//remove any trailing / characters in the domain names in case link is to root of domain
//dbglog("Entered the loop with exclude link of $excludelink");
// only replace if there is a domain to search for
if ($excludelink !=""){
//trim trailing / from domain
$excludelink = trim($excludelink,"/");
//dbglog("trimmed exclude link is : $excludelink");
//set pattern to search for
$pattern="#(<a [^>]*".$excludelink.".*?>)(.*?)<\/a>#";
//dbglog("Filtered with pattern : $pattern");
$pagehtml = preg_replace($pattern, "$2", $pagehtml);
}
Can this be added to the next release? (please)
There is a lower level at which you can intercept the link creation, and replace links to wikipages, by only text.
The links to wikipages, which are added to the generated pdf document (using the Bookcreator plugin), are already modified to links with only a #hash. So you have to remove only the other links to wikipages. Complexity: the _formatLink() is used for more types of links.
https://github.com/splitbrain/dokuwiki-plugin-dw2pdf/commit/60e59de7d6295c0b3cfb1268109a5e4457ac4705#diff-25eb2cfd41b242f700f7c778c4416c17
Bonjour We came across the same problem, we do not want everyone to follow some internal links (and worse! some of them lead to "temporary pages" ). I eventually discovered your solution. It's very useful, works perfectly well and yes it deserves to be part of a next release! Bien cordialement Philippe d'Anfray