heritrix3 icon indicating copy to clipboard operation
heritrix3 copied to clipboard

Disabling Anypath REST API possible?

Open Querela opened this issue 5 years ago • 0 comments

Some REST API paths are available via jobdir and via anypath. I'm not sure if anypath is required as theoretically all job files should be reachable via the jobdir URLs. Can I disable anypath because it publishes file system details and maybe provides access to any file of the system (not sure)?

https://github.com/internetarchive/heritrix3/blob/adac067ea74b5a89f631ef771e2f598819bac6c4/engine/src/main/java/org/archive/crawler/restlet/EngineApplication.java#L76-L88

Examples: https://localhost:8443/engine/job/myjob/jobdir/crawler-beans.cxml https://localhost:8443/engine/anypath//heritrix/jobs/myjob/crawler-beans.cxml For myjob in in the job directory /heritrix/jobs/, and the file crawler-beans.cxml in the root of the job dir.

I know that the Wiki even warns/shows examples how scripting can be used to run arbitrary(?) system commands. ref

Querela avatar Jan 09 '21 10:01 Querela