jekyll-scholar icon indicating copy to clipboard operation
jekyll-scholar copied to clipboard

Turning URLs into links

Open mfenner opened this issue 11 years ago • 55 comments

I want URLs in citations to be links, and I have monkey patched the reference_tag method to allow this:

module Jekyll
  class Scholar
    module Utilities

      alias_method :original_reference_tag, :reference_tag

      def reference_tag(entry)
        return missing_reference unless entry

        # Change URLs into markdown links
        entry["url"] = "[#{entry["url"]}](#{entry["url"]})" if entry["url"]

        entry = entry.convert(*bibtex_filters) unless bibtex_filters.empty?
        reference = CiteProc.process entry.to_citeproc,
          :style => style, :locale => config['locale'], :format => 'html'

        content_tag reference_tagname, reference,
          :id => [prefix, entry.key].compact.join('-')
      end
    end
  end
end

You can see that I change entry["url"] into a markdown link, which is then picked up my the markdown processor. You can see this at work at http://blog.martinfenner.org/about.html. Is this the correct way of doing this? I'm using Pandoc as markdown processor and in the default configuration it doesn't recognize blank URLs.

We could do something similar for entry["doi"], as some styles display the DOI.

mfenner avatar Aug 05 '13 05:08 mfenner

Absolutely. Jekyll-Scholar started out as very quick plugin — the idea was that it should allow you to re-use the utilities and override selected methods if you need custom behaviour. So your solution is basically what I assumed people would do. Another cool solution I've seen is by @ashinkarov here #23 using a custom CSL style. When we finally finish the citeproc-ruby rewrite it will be much easier to hook into the cite processing and adapt individual components.

As far as adding such customizations to the jekyll-scholar gem I'm not sure if it is really worth it to support this functionality and not better to make it easy for people to override everything to their use case. The big problem here is that I'm probably the least experienced jekyll-scholar user myself.

My rule of thumb is that I'll gladly add anything that is generic enough to be useful to most users as long as it is not difficult to maintain, so if you turn this into a pull request and add a test to make maintenance easier I'll gladly merge it :-)

inukshuk avatar Aug 05 '13 20:08 inukshuk

I think that the problem is that the proposed solution is not generic enough. For example on my page, I put "[url]" text that leads to the paper I am referring to. Assuming your patch would be merged-in, my publication list would be broken.

I believe it is arbitrary hard to hard-code this sort of things to make everyone happy. So as I wrote earlier, the solution that we really need is a dynamic modification of csl files. In essence it's all about representation of the data -- we are not changing the essence of the data.

ashinkarov avatar Aug 05 '13 20:08 ashinkarov

I am confused, or maybe I don't understand CSL enough. What I am thinking I propose is that every time a URL is written to the markdown file we are generating, we add standard markdown tags for a link around it. Why should this conflict with other URLs in the CSL? And is there a case for URLs in citations to not be links?

mfenner avatar Aug 05 '13 20:08 mfenner

CSL is just a template which is filled with the data from each individual bibitem. For example the content of the title may be taken in pads, or italic might be applied or something else. When it comes to URLs, if not being ignored, then they are passed as clear text, or adding "url:" prefix (it all depends on the style we are using).

Now you are proposing to preprocess url content and instead of clear text, you want to give '<a href="$url">$url</a>' to the CSL file. As a result you will have an effect as you have on your webpage. CSL would just put this text in the overall html.

Now, my problem here would be, that I don't want to see '<a href="$url">$url</a>', but I rather would want to see '<a ref="$url">[url]</a>' like I have on my page: http://ashinkarov.github.io/publications/ The other guy might want to present the url in yet another way on his page.

If you do this processing step at the level of CSL, then you don't have to modify the data that is comming to this CSL. Now, if you hardcode these things, you will need to find a way to parametrise your code, which might or might not cover all the cases.

So, the generic solution would be to specify xml code snippets in your config.yml (or somewhere else), which would add/replace parts of XML tree of the CSL files.

I hope that I explained it well enough. Feel free to ask more.

ashinkarov avatar Aug 05 '13 21:08 ashinkarov

In general I tend to agree with @ashinkarov here that reference data should be kept clean of any sort of markup (html, markdown, latex or otherwise); however, in this case this is completely internal to jekyll-scholar and jekyll-scholar as a blogging tool obviously generates html via markdown. So I think the proposed solution is a pretty neat way of turning the URLs into proper links by default.

Actually, I have thought about this a little and we may even accomplish this as a filter in bibtex-ruby; we could then simply add the name of the filter to the jekyll-scholar configuration to preprocess URLs and DOIs in this way. This way you wouldn't even need to override reference_tag and at the same time it would not interfere with other approaches at all.

@mfenner do you see any problem if the conversion is applied 'permanently' (of course only to the bibliography in memory not on disk) instead of every time in reference tag? Then we could just add a filter to handle the conversion after the bibliography is parsed.

inukshuk avatar Aug 06 '13 09:08 inukshuk

@ashinkarov your arguments make sense. I am using plain CSL styles without modifications so wasn't considering custom formatting.

@inukshuk a filter in bibtex-ruby would be fine. I can also continue monkey patching jekyll-scholar.

mfenner avatar Aug 06 '13 11:08 mfenner

@mfenner if you like you could you try something like this. Put a file markdown.rb (file name is not relevant) into your ext directory. There you need to create a class Markdown like this:

module YourNamespaceHere
  class Markdown < BibTeX::Filter
    def apply(value)
      return value unless value.to_s =~ ADD-URL-PATTERN-HERE
      value.to_s.gsub(/ADD_URL_PATTERN_HERE/, '[$1]($1)')
    end
  end
end

And then in your jekyll configuration set:

scholar:
  bibtex_filters:
    - latex
    - markdown

This would turn all values (not only in the URL field) into a markdown link permanently as far as jekyll scholar is concerned. If this works I think we could add this filter to jekyll-scholar by default for you to enable with the bibtex_filters options on demand.

inukshuk avatar Aug 06 '13 11:08 inukshuk

@inukshuk My _plugins/jekyll_scholar.rb now looks like this:

require 'jekyll/scholar'
require 'uri'

module MarkdownFilter
  class Markdown < BibTeX::Filter
    def apply(value)
      url = value.to_s.slice(URI.regexp(['http','https','ftp']))
      value = url ? "[#{url}](#{url})" : value
    end
  end
end

This works and is now live on my jekyll blog. I have two small issues:

  • I'm replacing value with url. I'm not sure whether an URL can be part of a field in Citeproc, e.g. the title.
  • This code has problems with old DOIs in funky formats e.g.
    http://dx.doi.org/10.1002/(SICI)1096-9098(199702)64:2<122::AID-JSO6>3.0.CO;2-D 

Auto-linking with Github-flavored markdown is also broken for this URL.

mfenner avatar Aug 07 '13 10:08 mfenner

Nice! I did not know about URI.regexp

The issue I see is that, in theory, you could have something like this (in the note field for instance):

Available at https://a.tld/b and at http://c.tld/d

That's why I would rather replace the URLs with #gsub. I guess this pertains to the point you raised above as well. I wouldn't know that a CSL processor makes any assumptions as to whether or not a field contains a URL so as far as I know all processors should just pass the markdown link through unscathed.

As for the second issue, perhaps we can add a separate pattern for DOIs; something like /\b(http[^\s]+doi.org\/[^\s]+)/?

inukshuk avatar Aug 07 '13 10:08 inukshuk

OK, I changed the code to do replace the URL, even if part of the field value:

require 'jekyll/scholar'
require 'uri'

module MarkdownFilter
  class Markdown < BibTeX::Filter
    def apply(value)
      url = value.to_s.slice(URI.regexp(['http','https','ftp']))
      return value unless url
      value.to_s.gsub(/(#{url})/, '[\1](\1)')
    end
  end
end

URI.regexp currently doesn't find some URLs, e.g. those that contain parentheses (because they confuse the regex matching) or really strange things like the URL above. Too bad, but least they are displayed as text. Will look at other patterns for URL matching.

mfenner avatar Aug 07 '13 10:08 mfenner

OK, now I have the bibtex filter working the way I want:

require 'jekyll/scholar'
require 'uri'

module MarkdownFilter
  class Markdown < BibTeX::Filter
    def apply(value)
      value.to_s.gsub(URI.regexp(['http','https','ftp'])) { |c| "[#{$&}](#{$&})" }
    end
  end
end

This code is pretty dense, but I didn't want to regex twice. This only works with valid URLs, but I see that as a feature, not a bug.

The problem of DOIs in funky formats I solved differently as I produced the URL myself: I changed the code to generate a URL from a DOI:

url = Addressable::URI.escape "http://dx.doi.org/#{doi}"

instead of

url = "http://dx.doi.org/#{doi}"

URI.escape is depreciated and CGI.escape does something slightly different, therefore the addressable gem.

mfenner avatar Aug 07 '13 12:08 mfenner

This looks pretty good!

I think it would be great to add the filter to jekyll-scholar — the only question is whether or not we should enable it by default.

inukshuk avatar Aug 07 '13 13:08 inukshuk

That looks good, assuming that URI.regexp works fine on various links :) Regarding enabling by default -- can we have a simple flag which would allow to toggle this behaviour. I don't mind if it's turned on by default, if it's documented and I can turn it off easily.

Cheers, Artem.

ashinkarov avatar Aug 07 '13 13:08 ashinkarov

Yes, the nice thing is that @mfenner turned it into a BibTeX-Ruby filter so we already have the option in place: bibtex_filter — right now it's default is set to [:latex] and we could change it to [:latex, :markdown]. If you don't need it you can simply set it to the current default option again.

inukshuk avatar Aug 07 '13 13:08 inukshuk

Ah, ok, I didn't remember that we have bibtex_filter available via configuration.

ashinkarov avatar Aug 07 '13 14:08 ashinkarov

@inukshuk will you add this filter to bibtex-ruby or jekyll-scholar? It would be good to add Cucumber tests to confirm that this filter can handle a variety of links. I would be happy to write the tests, but not before the weekend.

I'm for enabling by default since jekyll-scholar is always using markdown.

mfenner avatar Aug 07 '13 14:08 mfenner

@mfenner cucumber tests would be great. I was thinking of adding it to jekyll-scholar and I also think it would be a reasonable default — we just need to make sure people know how to switch to the old configuration.

inukshuk avatar Aug 07 '13 14:08 inukshuk

OK, give me a few days and I will try a pull request that includes cucumber tests. Unless you want to add the filter yourself, then the pull request will only contain the tests.

mfenner avatar Aug 07 '13 14:08 mfenner

No hurry : ) just let me know if I can do anything to help.

inukshuk avatar Aug 07 '13 14:08 inukshuk

I tried this and it just shows the markup in the bibliography list.

I have:

@article{test,
      author = {test},
      title = {test},
      URL = {http://google.com}
}

which shows as [1]test, test, (n.d.). [http://google.com](http://google.com). without converting the url into a clickable link.

Do I need to configure something else?

x12a1f avatar May 14 '15 08:05 x12a1f

I think we never actually added the filter, so what you need to do is add it to your _plugins/ext.rb file:

require 'jekyll/scholar'
require 'uri'

module MarkdownFilter
  class Markdown < BibTeX::Filter
    def apply(value)
      value.to_s.gsub(URI.regexp(['http','https','ftp'])) { |c| "[#{$&}](#{$&})" }
    end
  end
end

And then enable it in _config.yml:

scholar:
  bibtex_filters:
    - latex
    - markdown

inukshuk avatar May 14 '15 09:05 inukshuk

yes, that is what I did

x12a1f avatar May 14 '15 09:05 x12a1f

@mfenner is it OK if I add the plugin? Looks like I should write a test for it anyway : )

inukshuk avatar May 14 '15 10:05 inukshuk

I think for this to work the way @mfenner intended this needs to happen inside a markdown file, because you need the markdown processor to then convert the link.

Alternatively, you could adjust the filter above to generate an HTML link tag instead. Something like:

value.to_s.gsub(URI.regexp(['http','https','ftp'])) { |c| "<a href=\"#{$&}\">#{$&}</a>" }

Better yet, you could adjust your bibliography template

inukshuk avatar May 14 '15 11:05 inukshuk

@inukshuk fee free to add the plugin.

mfenner avatar May 14 '15 11:05 mfenner

Changing the filter to generate a HTML link tag works.

Thanks.

x12a1f avatar May 15 '15 06:05 x12a1f

I tried to add the markdown filter to make it work in this commit of my blog: https://github.com/rriemann/blog.riemann.cc/commit/e6d937db5bf321db1c40901e4226fa47f9b889c7

My bib file has doi fields e.g. doi = {10.1007/978-3-319-58469-0_12}, that get rendered as https://doi.org/10.1007/978-3-319-58469-0_12 but that are still not clickable.

The page is online here: https://blog.riemann.cc/output/

What is missing? Should I use in the Gemfile the jekyll_plugins group?

rriemann avatar Mar 17 '18 22:03 rriemann

The markdown filter is already part of Jekyll-Scholar so you don't need to add it yourself; you only need to enable it in your configuration.

What the filter does is convert URLs to markdown link syntax (these are later turned into HTML links when the markdown is being processed). This filter does not turn DOIs into URLs though. If that's happening on your page, it's probably another filter at work (which likely runs after the markdown filter, so those DOI URLs won't be detected). You could make sure your DOIs are converted before the markdown filter processes the values or have the DOI filter create markdown links in the first place.

inukshuk avatar Mar 19 '18 09:03 inukshuk

The CSL default converts DOIs to links https://doi.org/10.1007/978-3-319-59665-5_3. So I think links should just work if the markdown filter is activated.

rriemann avatar Mar 19 '18 10:03 rriemann

You can modify the csl file and add a piece of code that outputs dois in the preferred format. See how I handle urls: https://github.com/ashinkarov/ashinkarov.github.com/blob/source/plugins/ieee.csl#L180 You can do similar stuff for dois. Then you just point bibproc to use this file.

ashinkarov avatar Mar 19 '18 11:03 ashinkarov