Auto-CORPus icon indicating copy to clipboard operation
Auto-CORPus copied to clipboard

`list index out of range`

Open mdrishti opened this issue 1 month ago • 1 comments

Hi,

I am trying to run auto-corpus on some html files downloaded from PMC (e.g.: PMC1201259). I get the failed due to list index out of range. Not sure what I might be doing wrong. I will appreciate any help here.

-DT

mdrishti avatar Nov 05 '25 13:11 mdrishti

Hi,

I am trying with the following command auto-corpus -c autocorpus/configs/config_pmc.json -t "outputTest" -f tests/data/public/html/PMC/PMC8885717.html on mac OS iTerm2, where the input file is the one provided in the repo, as well as my own input html files.

Couple of points here:

a) For non-experts, it is frustrating to sift through the code and try to find the error. The log file reveals exactly what the standard output says- INFO...ERROR....WARNING. This is counterproductive for troubleshooting.

b) The explanation for the arguments like -b is absent. Sure it is a config parameter, but what options can be given there except for PMC (beyond what is already shown in the examples). IMO, again - this is counterintuitive.

c) Modifying the code does not help either. It doesn't reflect in the re-build of the package.

On top of that, if this is really a simple error in the actual framing of the command or some other silly error on the end-user's part, then it is even more frustrating given the amount of time spent in troubleshooting and not reaching anywhere, when just a simple documentation bit or better explanation of how to run the tool could have helped.

This tool is useful, and IMO very much needed, given that there aren't other tools that can help convert html to BioC - a format which is useful for parsing PMC full-text. But, even with the intensive documentation, it doesn't help if crucial bits of how to troubleshoot the tool are missing.

-DT

mdrishti avatar Nov 07 '25 14:11 mdrishti