knitcitations
knitcitations copied to clipboard
Error inserting reference with special characters
First, thanks a lot for the package - very useful!
Today I found a problem when inserting a reference by DOI. This is my Rmd:
output: pdf_document
bibliography: references.bib
library(knitcitations)
cleanbib()
cite_options(citation_format = "pandoc")
Test: r citet("10.1111/j.1461-0248.2007.01060.x")
.
write.bibtex(file="references.bib")
which gives this error:
Error in utf8ToInt(x) : invalid UTF-8 string Calls: <Anonymous> ... encoded_text_to_latex -> as.vector -> sapply -> lapply -> FUN -> utf8ToInt Execution halted
I think it may have something to do with the special characters in author names (Müller-Schärer)... Any clue how I could fix this? I couldn't find any help yet.
Many thanks in advance
Paco
My session info:
R version 3.1.3 (2015-03-09) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] knitcitations_1.0.5
loaded via a namespace (and not attached):
[1] bibtex_0.4.0 bitops_1.0-6 digest_0.6.8 htmltools_0.2.6 httr_0.6.1
[6] lubridate_1.3.3 memoise_0.2.1 plyr_1.8.1 Rcpp_0.11.5 RCurl_1.95-4.5
[11] RefManageR_0.8.45 RJSONIO_1.3-0 rmarkdown_0.5.1 stringr_0.6.2 tools_3.1.3
[16] XML_3.98-1.1 yaml_2.1.13
I cannot reproduce this error; everything works fine with this citation on my end (See my sessionInfo()
below). Looks like it is probably due to your locales -- I don't recognize your locale settings (I'm not familiar with Windows locales, but see ?Sys.setlocale
; locales are responsible for how such special characters are parsed.
Here's my whole session:
> library(knitcitations)
> cleanbib()
> cite_options(citation_format = "pandoc")
> citet("10.1111/j.1461-0248.2007.01060.x")
[1] "@Broennimann_2007"
> write.bibtex(file="references.bib")
Writing 1 Bibtex entries ... OK
Results written to file 'references.bib'
> sessionInfo()
R version 3.1.3 RC (2015-03-06 r67947)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitcitations_1.0.5 bibtex_0.4.0 RefManageR_0.8.45
loaded via a namespace (and not attached):
[1] bitops_1.0-6 digest_0.6.8 httr_0.6.1 lubridate_1.3.3
[5] memoise_0.2.1 plyr_1.8.1 Rcpp_0.11.5 RCurl_1.95-4.5
[9] RJSONIO_1.3-0 stringr_0.6.2 tools_3.1.3 XML_3.98-1.1
Thanks for the quick reply!
The problem occurs specifically when calling write.bibtex (which in turn calls RefManageR by @mwmclean ):
> library(knitcitations)
> cleanbib()
> cite_options(citation_format = "pandoc")
> citet("10.1111/j.1461-0248.2007.01060.x")
[1] "@Broennimann_2007"
> write.bibtex(file="references.bib")
Writing 1 Bibtex entries ... Error in utf8ToInt(x) : invalid UTF-8 string
> traceback()
14: utf8ToInt(x)
13: FUN("O. Broennimann and U. A. Treier and H. Müller-Schärer and W. Thuiller and A. T. Peterson and A. Guisan"[[1L]],
...)
12: lapply(X = X, FUN = FUN, ...)
11: sapply(x, do_utf8)
10: as.vector(switch(encoding, latin1 = sapply(x, do_latin1), latin2 = sapply(x,
do_latin2), latin9 = sapply(x, do_latin9), `UTF-8` = sapply(x,
do_utf8), utf8 = sapply(x, do_utf8), stop("unimplemented encoding")))
9: encoded_text_to_latex(format_author(object[[i]]), "UTF-8")
8: FUN(X[[1L]], ...)
7: lapply(object, format_bibentry1)
6: unlist(lapply(object, format_bibentry1))
5: head(unlist(lapply(object, format_bibentry1)), -1L)
4: toBiblatex(bib, ...)
3: writeLines(toBiblatex(bib, ...), fh)
2: WriteBib(entry, file = file, append = append, ...)
1: write.bibtex(file = "references.bib")
I will investigate with locales. You're probably right that's the root of the problem (e.g. see http://stackoverflow.com/questions/5205159/how-can-i-find-out-the-internal-code-representation-of-a-windows-1252-character).
I'll let you know if I manage to fix it.
Thanks!
Well, it seems Windows doesn't help to get this sorted... But at least I managed to make it work following your suggestion of changing locales: Sys.setlocale("LC_ALL", locale = "C")
. Although the special characters are still not parsed correctly in the final reference, at least the pdf is produced now.
I paste here the code in case other Windows users find it useful, or someone finds a better solution:
> library(knitcitations)
> cleanbib()
> cite_options(citation_format = "pandoc")
> Sys.setlocale("LC_ALL", locale = "C")
[1] "C"
> citet("10.1111/j.1461-0248.2007.01060.x")
[1] "@Broennimann_2007"
> write.bibtex(file="references.bib")
Writing 1 Bibtex entries ... OK
Results written to file 'references.bib'
The special characters (ü) are not parsed correctly:
Broennimann, O., U. A. Treier, H. **M<U+00FC>ller-Sch<U+00E4>rer**, W. Thuiller, A. T. Peterson, and A. Guisan. 2007. “Evidence of Climatic Niche Shift During Biological Invasion.” Ecol Letters 10 (8). Wiley-Blackwell: 701–9.
So it's not perfect, but at least works. Thanks again for your help
No problem. Windows really should have some locale that supports UTF-8 -- have you tried asking on stackoverflow on this?
On Wed, Mar 18, 2015 at 11:01 AM Francisco Rodriguez-Sanchez < [email protected]> wrote:
Well, it seems Windows doesn't help to get this sorted... But at least I managed to make it work following your suggestion of changing locales: Sys.setlocale("LC_ALL", locale = "C"). Although the special characters are still not parsed correctly in the final reference, at least the pdf is produced now.
I paste here the code in case other Windows users find it useful, or someone finds a better solution:
library(knitcitations) cleanbib() cite_options(citation_format = "pandoc") Sys.setlocale("LC_ALL", locale = "C") [1] "C" citet("10.1111/j.1461-0248.2007.01060.x") [1] "@Broennimann_2007" write.bibtex(file="references.bib") Writing 1 Bibtex entries ... OK Results written to file 'references.bib'
The special characters (ü) are not parsed correctly:
Broennimann, O., U. A. Treier, H. M<U+00FC>ller-Sch<U+00E4>rer, W. Thuiller, A. T. Peterson, and A. Guisan. 2007. “Evidence of Climatic Niche Shift During Biological Invasion.” Ecol Letters 10 (8). Wiley-Blackwell: 701–9.
So it's not perfect, but at least works. Thanks again for your help
— Reply to this email directly or view it on GitHub https://github.com/cboettig/knitcitations/issues/74#issuecomment-83099572 .
Hi Carl,
An update on this issue. I have tried with many references and am not getting an error in write.bibtex
anymore (maybe after upgrading to R 3.2.0?). So that's good :)
Errors still happen when pandoc attempts to produce final pdf with bibliography, in cases when some of the references produced by knitcitations
contain 'strange' characters. An example:
output: pdf_document
bibliography: references.bib
library(knitcitations)
cleanbib()
cite_options(citation_format = "pandoc")
This is a test r citet("10.1111/nph.12929")
.
References
write.bibtex(file="references.bib")
This Rmd is knitted to md successfully but then Rstudio gives the following error: `! Undefined control sequence. l.116 Francisco Rodr\iguez
pandoc.exe: Error producing PDF from TeX source Error: pandoc document conversion failed with error 43`
When you look at references.bib
you can see that some authors names include strange characters:
@Article{Gavin_2014, doi = {10.1111/nph.12929}, url = {http://dx.doi.org/10.1111/nph.12929}, year = {2014}, month = {jul}, publisher = {Wiley-Blackwell}, volume = {204}, number = {1}, pages = {37--54}, author = {Daniel G. Gavin and Matthew C. Fitzpatrick and Paul F. Gugger and Katy D. Heath and Francisco Rodr\'\iguez-S{\a'a}nchez and Solomon Z. Dobrowski and Arndt Hampe and Feng Sheng Hu and Michael B. Ashcroft and Patrick J. Bartlein and Jessica L. Blois and Bryan C. Carstens and Edward B. Davis and Guillaume {de Lafontaine} and Mary E. Edwards and Matias Fernandez and Paul D. Henne and Erin M. Herring and Zachary A. Holden and Woo-seok Kong and Jianquan Liu and Donatella Magri and Nicholas J. Matzke and Matt S. McGlone and Fr{\a'e}d{\a'e}rik Saltr{\a'e} and Alycia L. Stigall and Yi-Hsin Erica Tsai and John W. Williams}, title = {Climate refugia: joint inference from fossil records, species distribution models and phylogeography}, journal = {New Phytologist}, }
which are causing these errors.
Anyway, I just wanted to update you and let you know that write.bibtex
works fine now, even though I'm still getting errors later (with pandoc). But at least now it's not that difficult to correct these weird characters in the references.bib
file manually before calling pandoc.
I'll come back if I find a solution to this. Feel free to close this issue if you think it's not related to knitcitations anymore.
Thanks!
Guys, was there ever a resolution to this? As @rudolfli's link shows, this is causing an issue downstream. If there's a workaround, I can implement it in that package?
Hi,
I recall it was a Windows-specific issue, hard to solve because of Windows intrinsic limitations (lots of threads on stack overflow about UTF-8 and Windows). What I tried was post-processing the bibtex references as downloaded by knitcitations to remove the special characters before being processed by pandoc.
I paste below the function I made to go over all references and convert problematic fields to UTF-8 (using iconv
); but I don't remember if it worked fine in all cases:
#' Encode author names in UTF-8.
#'
#' Encode author names in BibEntry objects as UTF-8. Specially useful when working in Windows systems that do not support UTF-8.
#'
#' @import RefManageR
#' @param refs A BibEntry object.
#' @export
#' @return A BibEntry object.
#' @examples \dontrun{
#' library(knitcitations)
#' cleanbib()
#' cite_options(citation_format = "pandoc")
#' #citet("10.1111/nph.12929") # doesn't work
#' citep("10.1016/j.tree.2006.09.010")
#' citet("10.1111/j.1461-0248.2007.01060.x")
#' ref <- knitcitations:::get_bib()
#' ref.utf8 <- BibEntry_to_UTF8(ref)
#'
#'}
BibEntry_to_UTF8 <- function(refs){
for (i in 1:length(refs)){
authors <- paste(refs[[i]]$author, collapse = " and ")
refs[[i]]$author <- iconv(authors, to = "UTF-8")
}
for (i in 1:length(refs)){
refs[[i]]$title <- iconv(refs[[i]]$title, to = "UTF-8")
}
for (i in 1:length(refs)){
refs[[i]]$journal <- iconv(refs[[i]]$journal, to = "UTF-8")
}
refs
}
Hope this helps somehow. I'd be grateful if you find a solution to this!
I hit the same issue here today, but I don't quite follow why it's an issue that cannot be resolved. When I run readLines()
on Windows without specifying the encoding, I have problems with the Unicode characters, but when I run readLines()
with the encoding, I get the expected characters. Unfortunately, I don't see a way to give the text output from readLines()
to citep()
.
Rename this from .txt to .bib: Janssen_2013.txt
# Bad
readLines("Janssen_2013.bib")
# Good
readLines("Janssen_2013.bib", encoding="UTF-8")