gtsummary
gtsummary copied to clipboard
Add Dutch language
Hi, thanks for this great package!
I'm a Dutch researcher and R package developer. I added the Dutch language to your package.
To be specific, I:
- Pulled your repo 10 minutes ago (commit https://github.com/ddsjoberg/gtsummary/commit/0cab8a610c4424f1c0e9ed88687f32e9cc3190a2)
- Added the Dutch language (nl = formal ISO code) to
data-raw/gtsummary_translated.xlsx
- Updated the documentation of
R/theme_gtsummary.R
- Ran
devtools::document()
to haveman/theme_gtsummary.Rd
automatically updated - Ran the code in
data-raw/internal_data.R
to update the internal dataR/sysdata.rda
One suggestion: I would make the translation file a CSV instead of an Excel file, as it would allow normal git version control. Excel files are zipped files and git (and GitHub) cannot show the difference between files.
I did not yet update NEWS.md
.
Reviewer Checklist (if item does not apply, mark is as complete)
- [ ] Ensure all package dependencies are installed by running
renv::install()
- [X] PR branch has pulled the most recent updates from master branch. Ensure the pull request branch and your local version match and both have the latest updates from the master branch.
- [X] If an update was made to
tbl_summary()
, was the same change implemented fortbl_svysummary()
? - [X] If a new function was added, function included in
_pkgdown.yml
- [X] If a bug was fixed, a unit test was added for the bug check
- [ ] Run
pkgdown::build_site()
. Check the R console for errors, and review the rendered website. - [X] Code coverage is suitable for any new functions/features. Review coverage with
withr::with_envvar(new = c("NOT_CRAN" = "true"), covr::report())
. Begin in a fresh R session without any packages loaded. - [X] R CMD Check runs without errors, warnings, and notes
- [X]
usethis::use_spell_check()
runs with no spelling errors in documentation
When the branch is ready to be merged into master:
- [ ] Update
NEWS.md
with the changes from this pull request under the heading "# gtsummary (development version)
". If there is an issue associated with the pull request, reference it in parentheses at the end update (seeNEWS.md
for examples). - [ ] Increment the version number using
usethis::use_version(which = "dev")
- [ ] Run
codemetar::write_codemeta()
- [ ] Run
usethis::use_spell_check()
again - [ ] Approve Pull Request
- [ ] Merge the PR. Please use "Squash and merge".
Awesome, thank you so much! FYI, I probably will have a moment to review next week.
The translation file was originally a CSV. But we kept running into encoding issues when Excel would open the CSV: some characters would be changed. Not sure why xlsx files don't suffer from the same issue, but the encoding problems went away when we made the switch 🤷🏼♂️
Great, thanks!
Flat files have to be saved explicitly in a certain encoding. In Sublime Text for example (and in RStudio as well), there is a “Save with Encoding” command in the toolbar. It should be UTF-8 to support non-Latin characters.
We also have translated our R package AMR
, and this is the file we use: https://github.com/msberends/AMR/blob/main/data-raw/translations.tsv. As you can see, it also contains Russian and Swedish, which don’t give a problem. So not sure if that would solve it? It’s very nice and convenient to see small language fixes be printed in git diffs.
One more suggestion (I could open a separate issue or PR, let me know): to fully support foreign languages, number formatting and some other words should be adapted as well. For example, most Western European countries use a decimal comma, and a semicolon or full stop where English uses a comma. Internationally, for a median with IQR you would write 6.12 (3.23, 8.32)
while in the Netherlands for example, this would be 6,12 (3,23; 8,32)
. Might seem strange for ‘English’ eyes, but in a Dutch, French or Spanish text, the English format seems strange to me :) It’s sometimes even dangerous, as 1,000
means a thousand in English and a one with 3 decimals in Dutch.
You could of course add these single characters such as the comma to the translation file. I also noticed that the word “to” (in a date range) is not being translated. There might be others as well.
One more suggestion (I could open a separate issue or PR, let me know): to fully support foreign languages, number formatting and some other words should be adapted as well. For example, most Western European countries use a decimal comma, and a semicolon or full stop where English uses a comma. Internationally, for a median with IQR you would write
6.12 (3.23, 8.32)
while in the Netherlands for example, this would be6,12 (3,23; 8,32)
. Might seem strange for ‘English’ eyes, but in a Dutch, French or Spanish text, the English format seems strange to me :) It’s sometimes even dangerous, as1,000
means a thousand in English and a one with 3 decimals in Dutch.
We ran into an issue where the same language was looking for difference big mark, decimal mark, IQR separator, so I just added these arguments to the language theme
library(gtsummary)
theme_gtsummary_language("es", big.mark = ".", decimal.mark = ",", iqr.sep = "; ")
trial %>%
select(age, marker) %>%
tbl_summary()
You could of course add these single characters such as the comma to the translation file. I also noticed that the word “to” (in a date range) is not being translated. There might be others as well.
Support for dates came after translations, and I forgot to implement this! Would you mind opening a new issue for it?
Sure! And thanks for the arguments about dec and big mark, I overlooked these!
Hey hey @msberends ! Apologies for the delay on this review. A Dutch colleague was going to review, but no longer has the bandwidth to do so.
Everything looks good to me! Thank you so much for the wonderful pull request. If I had to describe this PR, I think I would say that "Er gaat niks boven PR!" ;)
Hey there, no problem at all!
Your Dutch is on point 😃 Thanks for the great project!