readtext
readtext copied to clipboard
Errors interrupting the text extraction process.
I am now trying to extract a large number of word files (1500) placed in one folder, using readtext (after creating a list using list.files)
I am getting errors with some files (examples below), the problem is when this error occurs, the extraction process is stopped. I can identify the problematic file, by changing verbosity = 3, but then I have to restart the extraction process (to find another problematic file(s)).
My question is if there is a way to avoid interrupting the process if an error is encountered?
I change ignore_missing_files = TRUE but this did not fix the problem.
examples for the errors encountered:
write error in extracting from zip file Error: 'C:\Users--- c/word/document.xml' does not exist.
I second the general idea of readtext coming with some error catching mechanism, because it can waste hours reading in a big batch of files only to then fail at some point with nothing to show for it.
A typical issue for me is an .rtf file saved as .doc by the creator which antiword cannot process and thus exits with an error; in this particular case it would be nice if readtext automatically tried the rtf reader when antiword fails (and guesses it's actually an rtf file).