Support for JSON serialization with REST API responses?
The Heritrix REST API currently only supports application/xml responses (besides HTML?). Would it be possible to include a JSON serialization, too? I find JSON a lot easier to work with compared to XML and think this could be a useful addition.
Looking at the examples in the documentation, I saw no attributes being used, so a conversion to JSON should be straightforward We would probably need to include the extra dependency org.restlet:org.restlet.ext.json (which also adds org.json:json, both have no known security vulnerabilities currently) to implement it the same way as for XML (source (1), source (2), + some other locations).
I did start a first implementation, there's not that much to change, really. But looking at the XML serialization where the order of properties is fixed (manually): (1) is it important that for JSON the same property order has to be guaranteed? By default it seems to sort them alphabetically which should be stable (for parsing if that is a concern). Note that the JSON library does not support the original order (probably), so it might not be possible without switching to other libraries...
And (2), should responses be wrapped with a engine or job key similar to XML?
XML Engine
curl -k -u admin:admin --anyauth --location -H "Accept: application/xml" https://localhost:8443/engine:
<?xml version="1.0" standalone='yes'?>
<engine>
<heritrixVersion>3.13.1-SNAPSHOT-2025-12-17T14:28:51Z</heritrixVersion>
<heapReport>
<usedBytes>13595632</usedBytes>
<totalBytes>100663296</totalBytes>
<maxBytes>268435456</maxBytes>
</heapReport>
<jobsDir>/home/user/heritrix3/dist/target/heritrix-3.13.1-SNAPSHOT/jobs</jobsDir>
<jobsDirUrl>https://localhost:8443/engine/jobsdir/</jobsDirUrl>
<availableActions>
<value>rescan</value>
<value>add</value>
<value>create</value>
</availableActions>
<jobs></jobs>
</engine>
JSON Engine
curl -k -u admin:admin --anyauth --location -H "Accept: application/json" https://localhost:8443/engine:
{
"availableActions": [
"rescan",
"add",
"create"
],
"heapReport": {
"usedBytes": 14586312,
"totalBytes": 100663296,
"maxBytes": 268435456
},
"jobsDirUrl": "https://localhost:8443/engine/jobsdir/",
"heritrixVersion": "3.13.1-SNAPSHOT-2025-12-17T14:28:51Z",
"jobsDir": "/home/user/heritrix3/dist/target/heritrix-3.13.1-SNAPSHOT/jobs",
"jobs": []
}
XML Job
curl -k -u admin:admin --anyauth --location -H "Accept: application/xml" https://localhost:8443/engine/job/test:
<?xml version="1.0" standalone='yes'?>
<job>
<shortName>test</shortName>
<statusDescription>Unbuilt</statusDescription>
<availableActions>
<value>build</value>
<value>launch</value>
</availableActions>
<launchCount>0</launchCount>
<lastLaunch/>
<isProfile>false</isProfile>
<primaryConfig>/home/user/heritrix3/dist/target/heritrix-3.13.1-SNAPSHOT/jobs/test/crawler-beans.cxml</primaryConfig>
<primaryConfigUrl>https://localhost:8443/engine/job/test/jobdir/crawler-beans.cxml</primaryConfigUrl>
<url>https://localhost:8443/engine/job/test/job/test</url>
<jobLogTail></jobLogTail>
<uriTotalsReport/>
<sizeTotalsReport>
<dupByHash>0</dupByHash>
<dupByHashCount>0</dupByHashCount>
<novel>0</novel>
<novelCount>0</novelCount>
<notModified>0</notModified>
<notModifiedCount>0</notModifiedCount>
<total>0</total>
<totalCount>0</totalCount>
<sizeOnDisk>0</sizeOnDisk>
</sizeTotalsReport>
<rateReport/>
<loadReport/>
<elapsedReport/>
<threadReport/>
<frontierReport/>
<crawlLogTail></crawlLogTail>
<configFiles></configFiles>
<isLaunchInfoPartial>false</isLaunchInfoPartial>
<isRunning>false</isRunning>
<isLaunchable>true</isLaunchable>
<hasApplicationContext>false</hasApplicationContext>
<alertCount>0</alertCount>
<checkpointFiles></checkpointFiles>
<reports></reports>
<heapReport>
<usedBytes>14723912</usedBytes>
<totalBytes>52428800</totalBytes>
<maxBytes>268435456</maxBytes>
</heapReport>
</job>
JSON Job
curl -k -u admin:admin --anyauth --location -H "Accept: application/json" https://localhost:8443/engine/job/test:
{
"availableActions": [
"build",
"launch"
],
"launchCount": 0,
"isProfile": false,
"reports": [],
"jobLogTail": [],
"sizeTotalsReport": {
"notModifiedCount": 0,
"total": 0,
"notModified": 0,
"dupByHashCount": 0,
"novelCount": 0,
"totalCount": 0,
"sizeOnDisk": 0,
"dupByHash": 0,
"novel": 0
},
"checkpointFiles": [],
"url": "https://localhost:8443/engine/job/test/job/test",
"crawlLogTail": [],
"primaryConfig": "/home/user/heritrix3/dist/target/heritrix-3.13.1-SNAPSHOT/jobs/test/crawler-beans.cxml",
"primaryConfigUrl": "https://localhost:8443/engine/job/test/jobdir/crawler-beans.cxml",
"statusDescription": "Unbuilt",
"heapReport": {
"usedBytes": 16858208,
"totalBytes": 52428800,
"maxBytes": 268435456
},
"configFiles": [],
"hasApplicationContext": false,
"isRunning": false,
"isLaunchable": true,
"alertCount": 0,
"shortName": "test",
"isLaunchInfoPartial": false
}