graphrag [Feature Request]: Indexing cost

Do you need to file an issue?

[X] I have searched the existing issues and this feature is not already filed.
[X] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
[X] I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.

Is your feature request related to a problem? Please describe.

I'm able to track the token consumption for querying, but i don't know how to track the cost of the indexing process.

Describe the solution you'd like

Any way to track the indexing cost, workarounds would also be fine.

Additional context

No response

Sep 18 '24 14:09 MarkusGutjahr

There is the file graphrag/llm/base/rate_limiting.py where it invokes the LLM and provides the input and output tokens used. I summed them up before the _handle_invoke_result call and logged them. After you have the input and output tokens you can compute the cost based on your particular model.

Sep 20 '24 08:09 andreiionut1411

the class RateLimitingLLM isn't getting called in the idexing process

Sep 20 '24 13:09 MarkusGutjahr

@MarkusGutjahr I have implemented a simple openai request tracker: https://github.com/sebastianschramm/openai-cost-tracker

Update: just added a CLI wrapper, so now you can install my openai-cost-tracker and then just run:

track-costs graphrag.index --root foo (for indexing)

or

track-costs graphrag.query --root foo --method local "My query" (for querying)

Just call cost_tracker.init_tracker() in front of the index script (take a look at the readme for how to do that for the indexing phase: https://github.com/sebastianschramm/openai-cost-tracker/blob/main/README.md#in-code-usage). Once enabled, it will log all openai requests to file. And the repo offers a "display-costs" command to retrieve the costs per model from all requests recorded in one log file.

Oct 21 '24 16:10 sebastianschramm