CDXJ: Error: no such capture field: method
When posting a CDXJ file (generated with pywb 2.6.7) to the OutbackCDX on DockerHub (v0.11.0?) like so
curl -X POST --data-binary @index.cdxj http://localhost:8080/coll
I'm seeing the following error get printed to the console:
At line: com,google-analytics)/collect?__wb_method=post&__wb_post_data=dj0xjl92pwo5nizhaxa9mszhptc2ndcxodg1myz0pxbhz2v2awv3jl9zptemzgw9ahr0chmlm0elmkylmkzhcg9klm5hc2euz292jtjgyxbvzcuyrmfwmjiwmza3lmh0bwwmzha9jtjgyxbvzcuyrmfwmjiwmza3lmh0bwwmdww9zw4tdxmmzgu9vvrgltgmzhq9qvbprcuzqsuymdiwmjilmjbnyxjjacuymdclmjatjtiwqsuymexpb24lmjbpbiuyme9yaw9ujnnkpte2lwjpdczzcj0xmzywedewmjamdna9mta1mhg4odamamu9mczfdxrtyt0xmtm2otk1ndmumtc1ntcxnza2mi4xnjuymtq0nja0lje2ntixndq2mjaumty1mje0ndyymc4xjl91dg16ptexmzy5otu0my4xnjuymtq0njiwljeums51dg1jc3ilm0qozglyzwn0ksu3q3v0bwnjbiuzrchkaxjly3qpjtdddxrty21kjtnekg5vbmupjl91dg1odd0xnjuymtq0njk0mdg5jl91pvfbq0nbuufcfizqawq9jmdqawq9jmnpzd0xnzu1nze3mdyylje2ntixndq2mdqmdglkpvvbltmzntizmtq1ltemx2dpzd00ntc1ndm3mc4xnjuymtq0nja0jmnkmt1oqvnbjmnkmj1oqvnbjtiwlsuymgfwb2qubmfzys5nb3ymy2qzptiwmtgxmdewjtiwdjqumsuymc0lmjbvbml2zxjzywwlmjbbbmfsexrpy3mmy2q0pxvuc3bly2lmawvkjtnbyxbvzc5uyxnhlmdvdizjzdu9dw5zcgvjawzpzwqlm0fhcg9klm5hc2euz292jmnknj1odhrwcyuzqsuyriuyrmrhcc5kawdpdgfsz292lmdvdiuyrlvuaxzlcnnhbc1gzwrlcmf0zwqtqw5hbhl0awnzlu1pbi5qcyzjzdc9ahr0chmlm0emej0xmjc2mdq0mjew 20220510010455 {"url":"https://www.google-analytics.com/collect","mime":"image/gif","status":"200","digest":"B5HJFHOVXMSWJ55LTR3DHDQE4KJKIKWO","length":"651","offset":"49132028","method":"POST","requestBody":"__wb_post_data=dj0xJl92PWo5NiZhaXA9MSZhPTc2NDcxODg1MyZ0PXBhZ2V2aWV3Jl9zPTEmZGw9aHR0cHMlM0ElMkYlMkZhcG9kLm5hc2EuZ292JTJGYXBvZCUyRmFwMjIwMzA3Lmh0bWwmZHA9JTJGYXBvZCUyRmFwMjIwMzA3Lmh0bWwmdWw9ZW4tdXMmZGU9VVRGLTgmZHQ9QVBPRCUzQSUyMDIwMjIlMjBNYXJjaCUyMDclMjAtJTIwQSUyMExpb24lMjBpbiUyME9yaW9uJnNkPTE2LWJpdCZzcj0xMzYweDEwMjAmdnA9MTA1MHg4ODAmamU9MCZfdXRtYT0xMTM2OTk1NDMuMTc1NTcxNzA2Mi4xNjUyMTQ0NjA0LjE2NTIxNDQ2MjAuMTY1MjE0NDYyMC4xJl91dG16PTExMzY5OTU0My4xNjUyMTQ0NjIwLjEuMS51dG1jc3IlM0QoZGlyZWN0KSU3Q3V0bWNjbiUzRChkaXJlY3QpJTdDdXRtY21kJTNEKG5vbmUpJl91dG1odD0xNjUyMTQ0Njk0MDg5Jl91PVFBQ0NBUUFCfiZqaWQ9JmdqaWQ9JmNpZD0xNzU1NzE3MDYyLjE2NTIxNDQ2MDQmdGlkPVVBLTMzNTIzMTQ1LTEmX2dpZD00NTc1NDM3MC4xNjUyMTQ0NjA0JmNkMT1OQVNBJmNkMj1OQVNBJTIwLSUyMGFwb2QubmFzYS5nb3YmY2QzPTIwMTgxMDEwJTIwdjQuMSUyMC0lMjBVbml2ZXJzYWwlMjBBbmFseXRpY3MmY2Q0PXVuc3BlY2lmaWVkJTNBYXBvZC5uYXNhLmdvdiZjZDU9dW5zcGVjaWZpZWQlM0FhcG9kLm5hc2EuZ292JmNkNj1odHRwcyUzQSUyRiUyRmRhcC5kaWdpdGFsZ292LmdvdiUyRlVuaXZlcnNhbC1GZWRlcmF0ZWQtQW5hbHl0aWNzLU1pbi5qcyZjZDc9aHR0cHMlM0Emej0xMjc2MDQ0MjEw","filename":"apod.warc.gz"}
java.lang.IllegalArgumentException: no such capture field: method
at outbackcdx.Capture.put(Capture.java:548)
at outbackcdx.Capture.fromCdxjLine(Capture.java:434)
at outbackcdx.Capture.fromCdxLine(Capture.java:385)
at outbackcdx.Webapp.post(Webapp.java:249)
at outbackcdx.Webapp.lambda$new$3(Webapp.java:102)
at outbackcdx.Web$Route.handle(Web.java:312)
at outbackcdx.Web$Router.handle(Web.java:236)
at outbackcdx.Webapp.handle(Webapp.java:594)
at outbackcdx.Web$Server.serve(Web.java:50)
at outbackcdx.NanoHTTPD$HTTPSession.execute(NanoHTTPD.java:848)
at outbackcdx.NanoHTTPD$1$1.run(NanoHTTPD.java:207)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Other CDXJ files seem to work normally however.
If it's helpful to have the WARC and CDXJ files please let me know!
OutbackCDX does not (yet) support storing arbitrarily named fields.
Note the "Things it doesn't do (yet): CDXJ" in the README. :-) While it can now map CDX11 fields to CDXJ for input/output it doesn't actually support storing arbitrary CDXJ data.
I don't have any short term plans to implement this myself but would be happy to accept a pull request.
Now I'm confused why another CDXJ file worked.
I think I understand now: current OutbackCDX can store CDXJ data of a known shape? And the method property is not something it is expecting?
Yes if the CDXJ input is limited to just the basic CDX11 fields it works.
Commit 9d73df3 added support for storing arbitrary extra CDXJ fields using a CBOR-based record encoding when can be enabled with --index-version 5. This is still experimental and a little more work is needed to actually make use of the method and requestBody fields when constructing the urlkey for compatibility with pywb.