AI-Horde icon indicating copy to clipboard operation
AI-Horde copied to clipboard

Text reward kudos to take generation time into account

Open arekku255 opened this issue 2 years ago • 12 comments

Job received from https://horde.koboldai.net for 512 tokens and 2048 max context. Starting generation...

(EOS token triggered!)
Time Taken - Processing:2.4s (2ms/T), Generation:28.1s (57ms/T), Total:30.5s (16.0T/s)
Submitted generation to https://horde.koboldai.net with id ??? and contributed for 2.5

Job received from https://horde.koboldai.net for 100 tokens and 1024 max context. Starting generation...

(EOS token triggered!)
Time Taken - Processing:0.9s (72ms/T), Generation:2.1s (47ms/T), Total:3.0s (14.9T/s)
Submitted generation to https://horde.koboldai.net with id ??? and contributed for 1.0

I know this is a volunteer service, but the disparity in rewards for serving those two requests might be problematic. It seems to be a better idea to generate 100 tokens ten times for 10 kudos, as opposed to generating 512 tokens once for 2.5 kudos. (Not counting EOS token)

If I wanted to maximize kudos gained I would turn down the maximum generation size to get more of the smaller but more time rewarding requests.

It would be nice if the interest of a selfish user aligned with the interests of the network. Me forcing the guy wanting 500 tokens to submit 5 requests is in my interest, but creates more requests for the network to handle.

If the reward was somehow related to how long it took to generate it would incentivize setting up workers able to handle big requests,

arekku255 avatar Sep 08 '23 08:09 arekku255

The algorithm calculating kudos rewards is open. If you can provide a better idea, feel free to send a PR, but I don't know enough math to figure this out

db0 avatar Sep 08 '23 10:09 db0

Sure I'm willing to give it a try.

What I was thinking is that some sort of constant times number of tokens generated could be a decent approximation + some BLAS factor.

If /v2/generate/text/submit could be extended to also submit how many tokens were generated, as well as how many tokens were in the initial context, this could then be forwarded through /v2/generate/status/{id} to tell the initial client how many tokens were generated and how big their context was.

From a client perspective knowing if amount of generated tokens = amount of requested tokens is useful as that would indicate the response hasn't finished and I would need to initiate another request.

As of now I don't think any server actually supplies these numbers in their API calls, but worst case scenario the bridge could supply them during a transition period. Tokenization doesn't seem to take very long time in comparison to generation or even the submit latency.

I don't expect to be able to get started until next weekend and I can't make any promises I'll actually be able to pull it off without help.

arekku255 avatar Sep 08 '23 14:09 arekku255

Yes I am planning to add something to receive the amount of tokens received, but we need to ensure workers don't easily lie about it.

Take your time. We're not in a rush :)

db0 avatar Sep 08 '23 14:09 db0

With torch already in the AI-Horde stack (used by the kudos model), a tokenizer (from, say, the transformers packages) can simply count the tokens in the response instead of relying on a report from the worker.

tazlin avatar Sep 08 '23 14:09 tazlin

Is that light enough? If anyone wants to add that option, to try out, we can go for it

db0 avatar Sep 08 '23 15:09 db0

There are more than one tokenizer model. Typically most are going to be running Llama but each model could use it's own unique tokenizer.

This is why I would prefer to get the token count directly from the model software, as that is the program that knows which tokenizer it is using.

I don't know if any software delivers these, my koboldcpp does not deliver token count on either input or output. I know the tokenizer so I can run it on my end point, but I would prefer to get it from the API.

arekku255 avatar Sep 08 '23 15:09 arekku255

The problem from workers self-reporting, is that a cunning worker can lie just enough to not be noticed

db0 avatar Sep 08 '23 15:09 db0

The correct tokenizer can be derived by the model name requested, and defaulted to some probably accurate one.

tazlin avatar Sep 08 '23 15:09 tazlin

Relying on worker reports for token counts and it being so closely tied to the kudos reward opens up an avenue for abuse. (Edit: this was composed before seeing db0's response)

tazlin avatar Sep 08 '23 15:09 tazlin

I agree with "Relying on worker reports for token counts and it being so closely tied to the kudos reward opens up an avenue for abuse."

It is however not the only avenue and the cunning working can still lie just enough to not be noticed by silently generating less tokens than requested.

I promise I generated a EOS token after 15 characters, pinkie swear!

arekku255 avatar Sep 08 '23 15:09 arekku255

Of course, which is why a better option here is to count the tokens ourselves. Optimally with the correct tokenizer, but if not possible due to load or other reasons, even a slightly different count might be OK

db0 avatar Sep 08 '23 15:09 db0

I think a solution looking at the tokens count, and addressing the reward appropriately to reduce the amount of cheating is the whole point of the issue. Ideally, the solution you're seeking would disincentivize that very behavior your suggesting.

tazlin avatar Sep 08 '23 15:09 tazlin