vespa icon indicating copy to clipboard operation
vespa copied to clipboard

Let Vespa HTTP client exit with non-zero return code on parsing errors

Open frodelu opened this issue 5 years ago • 10 comments

As reported by @rafan, the Vespa HTTP client will, by design, return with a standard zero return code even if the data fed har parsing errors:

Tue Oct 01 12:21:23 UTC 2019 Result received: 0 (0 failed so far, 1 sent, success rate 0.00 docs/sec).
Failure: Result for 'id:docker.ouroath.com:image::foobar
1569932482487 Trace starting 
1569932483817 Asked about retrying for cluster ID 0, number of retries is 1 Detail:
Detail resultType=FATAL_ERROR exception='No field 'foobar' in the structure of type 'image'' endpoint=someendpoint:443 ssl=true resultTimeLocally=1569932483817
1569932483835 Got detail: Detail resultType=FATAL_ERROR exception='No field 'foobar' in the structure of type 'image'' endpoint=someendpoint:443 ssl=true resultTimeLocally=1569932483817

:
    Detail resultType=FATAL_ERROR exception='No field 'foobar' in the structure of type 'image'' endpoint=someendpoint:443 ssl=true resultTimeLocally=1569932483817

Tue Oct 01 12:21:24 UTC 2019 Result received: 1 (1 failed so far, 1 sent, success rate 0.00 docs/sec).
$ echo $?
0

While this default behavior shouldn't change, we should add an optional command line parameter that would make vespa-http-client exit with a non-zero return code upon parsing errors.

frodelu avatar Oct 02 '19 10:10 frodelu

I can pick this up as part of HackTogether event. I am wondering if only parsing error is needed to be checked or any kind of document feeder error is enough to return 1 as exit code (if command line argument asks for it) ? i.e. Can we just check getFailedDocumentCount and if its positive the return code will become 1 (if asked in cmd line args)

mkagenius avatar Mar 22 '21 17:03 mkagenius

We can divide errors into there kinds, perhaps:

  1. Wrong input so no document operations can be issued.
  2. Wrong config parameters such that no operations can even succeed.
  3. Temporary problems on the receiving side such that some or all operations fail.
  4. Errors on some document operations such that those operations fail.

We want to return with 1. on the first category. We'd also want it on category 2., but it's hard to separate from category 3. In 3. we want to keep trying rather than failing as problems may be temporary. In category 4. we absolutely do not want to exit with code 1.

bratseth avatar Mar 22 '21 18:03 bratseth

I see.

I am wondering that since we do not want to change vespa's current behaviour, would it be enough to edit the exit code just before exiting based on getFailedDocumentCount : (Full File: Runner.java)

           if (commandLineArgs.getFile() != null) {
                double fileSizeMb = ((double) new File(commandLineArgs.getFile()).length()) / 1024.0 / 1024.0;
                System.err.println("Sent " + fileSizeMb + " MB in " + transferTimeSec + " seconds.");
                System.err.println("Speed: " + ((fileSizeMb / transferTimeSec) * 8.0) + " Mbits/sec, + HTTP overhead " +
                                   "(not taking compression into account)");
            }
        }
        callback.printProgress();
        // Proposed changes ----->
        if(callback.getFailedDocumentCount() > 0) {
                System.exit(1);
        }

    }

}

This preserves the current Vespa behavior i.e. it doesn't prematurely ends the feeding and only sends 1 as exit code if some doc has failed after doing everything else that vespa does today(either after retry or parser errors).

(Of course after there will be a check to see, if someone has supplied the required command line argument.)

mkagenius avatar Mar 23 '21 18:03 mkagenius

But if you feed a million documents and one fail I don't think we should exit with 1. We only want to do it if the input can't be parsed.

bratseth avatar Mar 23 '21 18:03 bratseth

I see.

Is FATAL_ERROR here directly indicative of the parser error or could it mean something else as well? (if yes, then this can't be used)

Another way not so good is to look at the exception string and see if it indicates parser error. Will have to check if Exception object has this info in some other form.

mkagenius avatar Mar 23 '21 19:03 mkagenius

That error is in the Result, which represents the response from the server side. If the input can't be parsed we don't create requests and there is no response.

bratseth avatar Mar 24 '21 08:03 bratseth

Original comment's exception log shows error in Result only:

Detail resultType=FATAL_ERROR exception='No field 'foobar' in the structure of type 'image'' endpoint=someendpoint:443 ssl=true resultTimeLocally=1569932483817

mkagenius avatar Mar 24 '21 10:03 mkagenius

Ok, I read the original description now, you are right: If we only do it when an option is set we can exit with 1 on server-side parse issues (or really any FATAL_ERROR as you suggested above).

(We could also do it without the option set in the case where all. feed operations fail with FATAL_ERROR.)

bratseth avatar Mar 24 '21 14:03 bratseth

One more doubt, why would the user only want exit code to be 1 in case of parsing error and not in other errors (temporary server down etc)? As I understand it (I could be wrong), the user wants the exit code to be 1 so that he can re-feed those docs? So, why does it matter what kind of error it is..

mkagenius avatar Mar 28 '21 08:03 mkagenius

If you care about whether operations are applied successfully you must implement error handling on the level of individual operations, by handling the Replies (typically counting the failures and alerting if there are too many). This is because there may be many operations and multiple destinations, and some may fail but not others.

If every operation fails everywhere it's never wrong to exit with 1.

bratseth avatar Mar 29 '21 05:03 bratseth