vespa
vespa copied to clipboard
Let Vespa HTTP client exit with non-zero return code on parsing errors
As reported by @rafan, the Vespa HTTP client will, by design, return with a standard zero return code even if the data fed har parsing errors:
Tue Oct 01 12:21:23 UTC 2019 Result received: 0 (0 failed so far, 1 sent, success rate 0.00 docs/sec).
Failure: Result for 'id:docker.ouroath.com:image::foobar
1569932482487 Trace starting
1569932483817 Asked about retrying for cluster ID 0, number of retries is 1 Detail:
Detail resultType=FATAL_ERROR exception='No field 'foobar' in the structure of type 'image'' endpoint=someendpoint:443 ssl=true resultTimeLocally=1569932483817
1569932483835 Got detail: Detail resultType=FATAL_ERROR exception='No field 'foobar' in the structure of type 'image'' endpoint=someendpoint:443 ssl=true resultTimeLocally=1569932483817
:
Detail resultType=FATAL_ERROR exception='No field 'foobar' in the structure of type 'image'' endpoint=someendpoint:443 ssl=true resultTimeLocally=1569932483817
Tue Oct 01 12:21:24 UTC 2019 Result received: 1 (1 failed so far, 1 sent, success rate 0.00 docs/sec).
$ echo $?
0
While this default behavior shouldn't change, we should add an optional command line parameter that would make vespa-http-client
exit with a non-zero return code upon parsing errors.
I can pick this up as part of HackTogether event. I am wondering if only parsing error is needed to be checked or any kind of document feeder error is enough to return 1 as exit code (if command line argument asks for it) ? i.e. Can we just check getFailedDocumentCount and if its positive the return code will become 1 (if asked in cmd line args)
We can divide errors into there kinds, perhaps:
- Wrong input so no document operations can be issued.
- Wrong config parameters such that no operations can even succeed.
- Temporary problems on the receiving side such that some or all operations fail.
- Errors on some document operations such that those operations fail.
We want to return with 1. on the first category. We'd also want it on category 2., but it's hard to separate from category 3. In 3. we want to keep trying rather than failing as problems may be temporary. In category 4. we absolutely do not want to exit with code 1.
I see.
I am wondering that since we do not want to change vespa's current behaviour, would it be enough to edit the exit code just before exiting based on getFailedDocumentCount
:
(Full File: Runner.java)
if (commandLineArgs.getFile() != null) {
double fileSizeMb = ((double) new File(commandLineArgs.getFile()).length()) / 1024.0 / 1024.0;
System.err.println("Sent " + fileSizeMb + " MB in " + transferTimeSec + " seconds.");
System.err.println("Speed: " + ((fileSizeMb / transferTimeSec) * 8.0) + " Mbits/sec, + HTTP overhead " +
"(not taking compression into account)");
}
}
callback.printProgress();
// Proposed changes ----->
if(callback.getFailedDocumentCount() > 0) {
System.exit(1);
}
}
}
This preserves the current Vespa behavior i.e. it doesn't prematurely ends the feeding and only sends 1 as exit code if some doc has failed after doing everything else that vespa does today(either after retry or parser errors).
(Of course after there will be a check to see, if someone has supplied the required command line argument.)
But if you feed a million documents and one fail I don't think we should exit with 1. We only want to do it if the input can't be parsed.
I see.
Is FATAL_ERROR
here directly indicative of the parser error or could it mean something else as well? (if yes, then this can't be used)
Another way not so good is to look at the exception string and see if it indicates parser error. Will have to check if Exception object has this info in some other form.
That error is in the Result, which represents the response from the server side. If the input can't be parsed we don't create requests and there is no response.
Original comment's exception log shows error in Result only:
Detail resultType=FATAL_ERROR exception='No field 'foobar' in the structure of type 'image'' endpoint=someendpoint:443 ssl=true resultTimeLocally=1569932483817
Ok, I read the original description now, you are right: If we only do it when an option is set we can exit with 1 on server-side parse issues (or really any FATAL_ERROR as you suggested above).
(We could also do it without the option set in the case where all. feed operations fail with FATAL_ERROR.)
One more doubt, why would the user only want exit code to be 1 in case of parsing error and not in other errors (temporary server down etc)? As I understand it (I could be wrong), the user wants the exit code to be 1 so that he can re-feed those docs? So, why does it matter what kind of error it is..
If you care about whether operations are applied successfully you must implement error handling on the level of individual operations, by handling the Replies (typically counting the failures and alerting if there are too many). This is because there may be many operations and multiple destinations, and some may fail but not others.
If every operation fails everywhere it's never wrong to exit with 1.