gtsummary icon indicating copy to clipboard operation
gtsummary copied to clipboard

Add Dutch language

Open msberends opened this issue 2 years ago • 6 comments

Hi, thanks for this great package!

I'm a Dutch researcher and R package developer. I added the Dutch language to your package.

To be specific, I:

  • Pulled your repo 10 minutes ago (commit https://github.com/ddsjoberg/gtsummary/commit/0cab8a610c4424f1c0e9ed88687f32e9cc3190a2)
  • Added the Dutch language (nl = formal ISO code) to data-raw/gtsummary_translated.xlsx
  • Updated the documentation of R/theme_gtsummary.R
  • Ran devtools::document() to have man/theme_gtsummary.Rd automatically updated
  • Ran the code in data-raw/internal_data.R to update the internal data R/sysdata.rda

One suggestion: I would make the translation file a CSV instead of an Excel file, as it would allow normal git version control. Excel files are zipped files and git (and GitHub) cannot show the difference between files.

I did not yet update NEWS.md.


Reviewer Checklist (if item does not apply, mark is as complete)

  • [ ] Ensure all package dependencies are installed by running renv::install()
  • [X] PR branch has pulled the most recent updates from master branch. Ensure the pull request branch and your local version match and both have the latest updates from the master branch.
  • [X] If an update was made to tbl_summary(), was the same change implemented for tbl_svysummary()?
  • [X] If a new function was added, function included in _pkgdown.yml
  • [X] If a bug was fixed, a unit test was added for the bug check
  • [ ] Run pkgdown::build_site(). Check the R console for errors, and review the rendered website.
  • [X] Code coverage is suitable for any new functions/features. Review coverage with withr::with_envvar(new = c("NOT_CRAN" = "true"), covr::report()). Begin in a fresh R session without any packages loaded.
  • [X] R CMD Check runs without errors, warnings, and notes
  • [X] usethis::use_spell_check() runs with no spelling errors in documentation

When the branch is ready to be merged into master:

  • [ ] Update NEWS.md with the changes from this pull request under the heading "# gtsummary (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (see NEWS.md for examples).
  • [ ] Increment the version number using usethis::use_version(which = "dev")
  • [ ] Run codemetar::write_codemeta()
  • [ ] Run usethis::use_spell_check() again
  • [ ] Approve Pull Request
  • [ ] Merge the PR. Please use "Squash and merge".

msberends avatar Aug 05 '22 07:08 msberends

Awesome, thank you so much! FYI, I probably will have a moment to review next week.

The translation file was originally a CSV. But we kept running into encoding issues when Excel would open the CSV: some characters would be changed. Not sure why xlsx files don't suffer from the same issue, but the encoding problems went away when we made the switch 🤷🏼‍♂️

ddsjoberg avatar Aug 05 '22 10:08 ddsjoberg

Great, thanks!

Flat files have to be saved explicitly in a certain encoding. In Sublime Text for example (and in RStudio as well), there is a “Save with Encoding” command in the toolbar. It should be UTF-8 to support non-Latin characters.

We also have translated our R package AMR, and this is the file we use: https://github.com/msberends/AMR/blob/main/data-raw/translations.tsv. As you can see, it also contains Russian and Swedish, which don’t give a problem. So not sure if that would solve it? It’s very nice and convenient to see small language fixes be printed in git diffs.

msberends avatar Aug 05 '22 11:08 msberends

One more suggestion (I could open a separate issue or PR, let me know): to fully support foreign languages, number formatting and some other words should be adapted as well. For example, most Western European countries use a decimal comma, and a semicolon or full stop where English uses a comma. Internationally, for a median with IQR you would write 6.12 (3.23, 8.32) while in the Netherlands for example, this would be 6,12 (3,23; 8,32). Might seem strange for ‘English’ eyes, but in a Dutch, French or Spanish text, the English format seems strange to me :) It’s sometimes even dangerous, as 1,000 means a thousand in English and a one with 3 decimals in Dutch.

You could of course add these single characters such as the comma to the translation file. I also noticed that the word “to” (in a date range) is not being translated. There might be others as well.

msberends avatar Aug 05 '22 11:08 msberends

One more suggestion (I could open a separate issue or PR, let me know): to fully support foreign languages, number formatting and some other words should be adapted as well. For example, most Western European countries use a decimal comma, and a semicolon or full stop where English uses a comma. Internationally, for a median with IQR you would write 6.12 (3.23, 8.32) while in the Netherlands for example, this would be 6,12 (3,23; 8,32). Might seem strange for ‘English’ eyes, but in a Dutch, French or Spanish text, the English format seems strange to me :) It’s sometimes even dangerous, as 1,000 means a thousand in English and a one with 3 decimals in Dutch.

We ran into an issue where the same language was looking for difference big mark, decimal mark, IQR separator, so I just added these arguments to the language theme

library(gtsummary)
theme_gtsummary_language("es", big.mark = ".", decimal.mark = ",", iqr.sep = "; ")

trial %>%
  select(age, marker) %>%
  tbl_summary()

image

ddsjoberg avatar Aug 05 '22 12:08 ddsjoberg

You could of course add these single characters such as the comma to the translation file. I also noticed that the word “to” (in a date range) is not being translated. There might be others as well.

Support for dates came after translations, and I forgot to implement this! Would you mind opening a new issue for it?

ddsjoberg avatar Aug 05 '22 12:08 ddsjoberg

Sure! And thanks for the arguments about dec and big mark, I overlooked these!

msberends avatar Aug 05 '22 12:08 msberends

Hey hey @msberends ! Apologies for the delay on this review. A Dutch colleague was going to review, but no longer has the bandwidth to do so.

Everything looks good to me! Thank you so much for the wonderful pull request. If I had to describe this PR, I think I would say that "Er gaat niks boven PR!" ;)

ddsjoberg avatar Sep 14 '22 01:09 ddsjoberg

Hey there, no problem at all!

Your Dutch is on point 😃 Thanks for the great project!

msberends avatar Sep 14 '22 05:09 msberends