docxtractr
docxtractr copied to clipboard
Alternative way of Supporting for doc-files
Thanks a lot for such a great package.
I was trying out docxtractr::read_docx
on doc
files in Windows 10
using LibreOffice Version: 6.2.5.2 (x64)
.
It was horribly slow (due to LibreOffice I guess) if I don't open LibreOffice (manually outside R). Once I close and run the same code in R again it's slow.
fn <- "rough/messy_files/doc.doc"
library(tictoc)
# LibreOffice never opened in after last PC-reboot
tic()
tmp <- docxtractr::read_docx(fn)
toc()
# 285.63 sec elapsed
# 4.7 min !
# LibreOffice open
tic()
tmp <- docxtractr::read_docx(fn)
toc()
# 1.1 sec elapsed
# LibreOffice closed after open
tic()
tmp <- docxtractr::read_docx(fn)
toc()
# 24.21 sec elapsed
It is ok for a single file but if you have bundles of files then definitely not a good thing. I was thinking if any alternative way of supporting doc files can be given to users.
Like use of docx4j as mentioned in this repository. Then the system dependency (on LibreOffice) will go away and I believe that will be smoother also.
Ref https://github.com/hrbrmstr/docxtractr/issues/5