trafilatura icon indicating copy to clipboard operation
trafilatura copied to clipboard

Success exit code even on fatal error

Open rozbb opened this issue 3 years ago • 4 comments

I have code that shells out to trafilatura for a given URL. It would be nice to be able to tell when trafilatura was successful or not. Currently, the binary's exit code does not reflect success. If I run trafilatura on a page that fails, e.g.,

trafilatura --json -u "https://www.sec.gov/Archives/edgar/data/1418091/000110465922078413/tm2220599d1_ex99-p.htm"

I see no data at all, and the exit code is 0, i.e., echo $? returns 0.

It would be better to have trafilatura do a sys.exit(-1) or something whenever a fatal error occurs. My current workaround is to treat JSON parsing errors as trafilatura extraction errors, since the empty string is invalid JSON.

Thank you so much for your work!

rozbb avatar Jul 10 '22 05:07 rozbb

Hi @rozbb, in this case the download seems to fail. Thanks for your suggestion, I agree that it would be best to return another exit code.

adbar avatar Jul 11 '22 17:07 adbar

Hi @rozbb, the commit above should work as it should. You can benefit from it by installing the latest version straight from the repository.

adbar avatar Aug 01 '22 15:08 adbar

Fantastic, thank you! Seem to work in my test cases. Feel free to close

rozbb avatar Aug 01 '22 17:08 rozbb

I still need to add a line about it in the docs but will close the issue thereafter.

adbar avatar Aug 02 '22 11:08 adbar