pepper Returns incomplete data

Sometimes pepper returns data before jid available. It looks like this:

{
    "return": [
        {
            "minion": false
        }
    ]
}

It is because of this (in SaltStack API server log):

2018-04-15 03:11:13,231 [salt.client      :1055][WARNING ][42] jid does not exist
2018-04-15 03:11:13,236 [salt.client      :1060][WARNING ][42] Returner unavailable:

IMHO def poll_for_returns(self, api, load) in cli.py should be updated accordingly.

Apr 15 '18 04:04 ezh

Fixed in #142

Apr 15 '18 12:04 ezh

Have you tried passing --fail-if-incomplete in order to use the poll_for_returns?

Jun 13 '18 19:06 gtmanfred

--fail-if-incomplete is useless it this case.

we have extremely busy/slow SaltStack instance
pepper submit job to SaltStack
pepper get empty response from SaltStack
pepper fail with --fail-if-incomplete and you should manually list all JIDs, search for submitted job, check their code....
after few milliseconds SaltStack will create record that new job exists... but it is too late for user

Jun 17 '18 16:06 ezh

If you have a busy salt master, you should do the same thing that is recommended on the regular salt commandline, use the local_async client, you will get the jid back and which minions that should return.

$ pepper --client local_async \* test.ping                                                                                                                                                                                                                                                                                    
local_async:
    ----------
    jid:
        20180617200846098088
    minions:
        - minion1
        - minion2

Then you can look up the job in the job client, and wait for all those minions to have returns, and know that it has finished running.

Jun 17 '18 20:06 gtmanfred

Sure, especially for production use.

But it is not convenient for some cases, like when you are trying to debug or changing master options itself via rest api in real time and trying to recover environment as fast as possible.

Please close if this behavior is ok for users.

Jun 17 '18 21:06 ezh

I thought that this fix was suitable because I fixed interactive mode which had been useless before.
I provided PR that was already based on -t TIMEOUT, --timeout=TIMEOUT

With regard to your note about infinite loop: There is no difference from the user point of view between infinite loop of REST API calls and single call that hangs forever. Also I doubt that hiding backend failures is good practice. If SaltStack is in loop of incomplete responses then it is broken. This is out of scope of pepper.

Jun 18 '18 05:06 ezh

@whiteinge can you give us your opinion on this?

Thanks, Daniel

Jun 18 '18 13:06 gtmanfred

Synchronously waiting for Salt returns is always something of an educated-guess. This is true of the salt CLI as well as salt-api (and thus Pepper). Salt does its best with the default settings, however the timeout kwarg can be used to customize Salt's default guess. This kwarg is supported by LocalClient, RunnerClient, and (I think) WheelClient.

It is common IME to need to specify that value when working with heavily-loaded or otherwise slow Masters, or when executing especially long-running functions. (Note, it is a top-level kwarg, e.g., [{"client": "local", "tgt": "*", "fun": "test.sleep", "kwarg": {"length": 900}, "timeout": 1000}] .)

It looks like Pepper's -t flag is not currently wired up to add that kwarg. That is probably an oversight and would be a sensible addition. This would also bring parity between Pepper and Salt's usage of that flag.

My suggestion:

Test the workflow that you're trying to do without Pepper by sending HTTP calls directly to salt-api.
Ensure that the timeout kwarg to Salt is correctly able to make Salt wait long enough to collect the returns from the function you're running.
1. If not, Salt itself will need tweaking or another strategy will be needed for this (see below).
2. If so, add support for sending timeout to Pepper and repeat the test using that addition.

Fallback:

As previously mentioned, all the machinery above is an educated guess. Separate timeout exist at various levels in Salt (CLI, LocalClient/RunnerClient, API, HTTP) and getting them all lined up can be tricky in non-optimal conditions (system load on the Master or Minion, unreliable network, when using Syndic, etc). Job return delivery is a reliable system in Salt and the only completely reliable way to watch for job returns is to either watch the event bus or to poll the job cache (as Daniel previously suggested).

Jun 18 '18 19:06 whiteinge