citr icon indicating copy to clipboard operation
citr copied to clipboard

Encoding issue (not UTF-8) and repeated entries

Open GegznaV opened this issue 5 years ago • 5 comments

Describe the bug

  1. Encoding issue in displaying non-ASCII characters.
  2. Repeated entries of the same source (in bib file they are entered only once)

To Reproduce Call citr RStudio add-in from the attached project: citr--UTF-8--bug.zip

Expected behavior

  1. Correct encoding (UTF-8) for all characters.
  2. Each entry is shown exactly once.

Screenshots image

Encoding is set to UTF-8 in settings: image

Additional context

R             3.6.3
RStudio       1.2.5033
citr          0.3.2
- Session info ----------------------------------
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       Europe/Helsinki             
 date     2020-03-15    

GegznaV avatar Mar 15 '20 12:03 GegznaV

Another encoding issue may be here:

https://github.com/crsh/citr/blob/0afd6f97b35b294c655fa5831fb9db3d818e70c4/R/insert_citation.R#L107

Shouldn't it be:

parent_document <- readLines(parents_path[parents], warn = FALSE, encoding = getOption("citr.encoding")) 

GegznaV avatar Mar 15 '20 14:03 GegznaV

#68 solves the issue of duplicated entries. It addresses one more potential issue related to encoding.

And the indicated encoding issue is related to RefManageR::ReadBib() which does not respect the value of encoding:

RefManageR::ReadBib("book.bib", check = FALSE, .Encoding = "UTF-8")

## [1] V. Čekanavičius and G. Murauskas. _Statistika ir jos taikymai I_. Vilnius:
## TEV, 2006, p. 240. ISBN: 9986-546-93-1.

##  / truncated /

## [5] V. Janilionis, V. Morkevicius, and R. Rauleckas. “III dalis. StatistinÄ—s
## analizÄ—s pavyzdžių naudojant pavyzdin\ce skaitmenin\ce duomenų baz\ce
## medžiaga”. In: _StatistinÄ— kiekybinių duomenų analizÄ— su SPSS ir Stata_.
## Kaunas, 2008. Chap. 10. Daugia, p. 393. <URL:
## http://www.lidata.eu/index.php?file=files/mokymai/stat/stat.html{\&}course{\_}file=stat{\_}III{\_}10.html>.

#  / truncated /

## Warning messages:
## 1: Janilionis2008-III-10: unknown macro '\c' 
## 2: Janilionis2008-III-10: unknown macro '\c' 
## 3: Janilionis2008-III-10: unknown macro '\c' 

This encoding issue is related to #53

GegznaV avatar Mar 15 '20 15:03 GegznaV

Thanks for the PR, I've hardcoded the expected encoding of parent documents to UTF-8, because rmarkdown assumes UTF-8 encoding anyways and because the option citr.encoding specifies the encoding of the Bib-file.

crsh avatar Jun 04 '20 08:06 crsh

Hi @GegznaV, has this issue been resolved (except for the upstream encoding issue)?

crsh avatar Jul 15 '20 12:07 crsh

It seems that only the upstream issue is left.

GegznaV avatar Jul 30 '20 02:07 GegznaV