fah-issues
fah-issues copied to clipboard
Clients returning error results consecutively continue to receive new WU assignments
Expected Behavior
-
Clients returning more than
max-slot-errors <integer=10>
faulty returns should have been paused automatically. These clients do not appear to have been paused and unpaused/ restarted as seen from the timestamps. -
Clients returning more than a certain number of bad results should be blacklisted for a predefined duration (similar to the staggered waiting period when requesting assignments)
Current Behavior
There are several clients that appear to be returning bad results consecutively but the slot is not getting paused automatically after 10 errors and/ or the client IP isn't getting blacklisted. Consequently, these clients are requesting and returning thousands of WUs without actually doing any useful work.
Here are stats for p13429 over the past week where clients have not returned a single successful result
The following examples are for p13439 WUs assigned between 2020-12-14 00:00:00 and 2020-12-14 03:00:00:
9871DF5E7080AB71 | Crazybyter | 239672 | NVidia GPU | Windows
AB17CD5E18DCEEC4 | bangdonate6 | 0 | NVidia GPU | Windows
C36AC95E5291759B | osucycling | 2075 | NVidia GPU | Linux
Possible Solution (Optional)
- Investigate and address why the slot pause code
max-slot-errors <integer=10>
isn't triggered. Considering this is not a parameter that is available to be changed via the UI, it is unlikely this has been changed from the default. - Investigate what is causing the failures and address this in the core, if possible. Alternatively, prevent assignments to such clients.
- Determine if AS blacklisting should cover cases such as this.
Steps To Reproduce
Unknown. Unsure if this issue is caused by the client, the GPU core, the AS code or a combination of all of these.
Context
Observed when checking OSG's results.