readtext icon indicating copy to clipboard operation
readtext copied to clipboard

Errors interrupting the text extraction process.

Open bes827 opened this issue 4 years ago • 1 comments

I am now trying to extract a large number of word files (1500) placed in one folder, using readtext (after creating a list using list.files)

I am getting errors with some files (examples below), the problem is when this error occurs, the extraction process is stopped. I can identify the problematic file, by changing verbosity = 3, but then I have to restart the extraction process (to find another problematic file(s)).

My question is if there is a way to avoid interrupting the process if an error is encountered?

I change ignore_missing_files = TRUE but this did not fix the problem.

examples for the errors encountered:

write error in extracting from zip file Error: 'C:\Users--- c/word/document.xml' does not exist.

bes827 avatar Jul 23 '20 01:07 bes827

I second the general idea of readtext coming with some error catching mechanism, because it can waste hours reading in a big batch of files only to then fail at some point with nothing to show for it.

A typical issue for me is an .rtf file saved as .doc by the creator which antiword cannot process and thus exits with an error; in this particular case it would be nice if readtext automatically tried the rtf reader when antiword fails (and guesses it's actually an rtf file).

michalovadek avatar Nov 08 '22 12:11 michalovadek