Scrapegraph-ai icon indicating copy to clipboard operation
Scrapegraph-ai copied to clipboard

execution_info cost tracking broken for small amounts + missing webpage content tokens

Open gouthamindukuri opened this issue 4 months ago • 3 comments

Describe the bug

execution_info shows total_cost_USD: 0.0 for OpenRouter calls that cost small amounts. Two issues:

  1. Rounding problem: Small costs <$0.0 get rounded down to 0.0 instead of showing the actual amount
  2. Missing tokens: prompt_tokens only counts the user prompt, completely ignores the webpage content that gets fed to the LLM

To Reproduce

Using OpenRouter with gpt-4.1 model, for a decent size webpage scraping call, the exec info shows:

[
  {
    "node_name": "GenerateAnswer",
    "total_tokens": 213,
    "prompt_tokens": 193,
    "completion_tokens": 20,
    "successful_requests": 1,
    "total_cost_USD": 0.0,
    "exec_time": 1.89
  },
  {
    "node_name": "TOTAL RESULT", 
    "total_tokens": 213,
    "prompt_tokens": 193,
    "completion_tokens": 20,
    "successful_requests": 1,
    "total_cost_USD": 0.0,
    "exec_time": 8.98
  }
]

But checking OpenRouter shows I actually got charged (even if small amounts like $0.007-0.01 per call).

Expected behavior

  • total_cost_USD should show the actual cost, not 0.0
  • prompt_tokens should include webpage content tokens, not just the user prompt (193 tokens seems way too low for a full webpage + prompt)

Additional context

Input token usage issue is actaully for manual cost tracking as a fallback which is now impossible.

gouthamindukuri avatar Aug 26 '25 14:08 gouthamindukuri

Hi @gouthamindukuri! I'm Dosu and I’m helping the Scrapegraph-ai team.

For the cost tracking issue: ScrapeGraphAI calculates and accumulates the actual cost as a float, but the display rounds to four decimals (e.g., 0.0000), so any cost below $0.0001 will appear as zero even if nonzero. The underlying value is not lost, just hidden by formatting. You can adjust the display precision if you need to see smaller amounts source.

For the missing webpage content tokens: ScrapeGraphAI always builds prompts by combining both the user prompt and the webpage content, so the full input is sent to the LLM. However, token usage stats (prompt_tokens) come directly from the LLM provider's API response (via LangChain), and sometimes only count the user prompt if the provider or integration misreports or omits the webpage content tokens. This is an upstream issue with LangChain or the LLM endpoint, not ScrapeGraphAI itself source.

If you need manual fallback cost tracking, you'll need the provider or LangChain to return accurate token usage for the entire prompt (including webpage content). No recent changes in ScrapeGraphAI address these issues.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Aug 26 '25 14:08 dosubot[bot]

Update:

I ran some additional tests and I believe the problem is with OpenRouter compatibility. When I change the endpoint to OpenAI, I can see the exact cost usage using prettify_exec_info. The following code does not display exact cost.

graph_config = { "llm": { "api_key": "", "base_url": "https://openrouter.ai/api/v1", "model": "openai/gpt-4.1", }, "verbose": True, "headless": True, }

smart_scraper_graph = SmartScraperGraph( prompt="Extract the main title, description, and key information from this page", source="https://httpbin.org/json", config=graph_config, )

result = smart_scraper_graph.run()

graph_exec_info = smart_scraper_graph.get_execution_info() print(prettify_exec_info(graph_exec_info))

gouthamindukuri avatar Aug 26 '25 17:08 gouthamindukuri

You're right—cost tracking works with OpenAI endpoints because ScrapeGraphAI expects usage metadata and model names that match its internal cost dictionaries. OpenRouter isn't a recognized provider in the codebase, and its models (even if OpenAI-compatible) aren't listed in the cost calculation logic, so costs default to zero even when tokens are used and you are charged by OpenRouter. The callback manager relies on the LLM response to provide token usage and cost info, but if the provider doesn't return this in the expected format, ScrapeGraphAI can't track it accurately source.

A known workaround is to pass a model instance directly (using LangChain's ChatOpenAI with OpenRouter's base URL and API key) and specify model_tokens in your config, but this only helps if the provider's API returns compatible usage metadata source. If OpenRouter's API doesn't provide token/cost info in the same way as OpenAI, ScrapeGraphAI will still report zero cost and incomplete token usage.

For now, accurate cost tracking with OpenRouter isn't supported out of the box. You'd need either upstream changes (OpenRouter/LangChain integration improvements) or to patch ScrapeGraphAI's cost dictionaries and callback logic to recognize OpenRouter models and parse their usage metadata. There are no recent updates or merged PRs addressing this compatibility gap source.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Aug 26 '25 17:08 dosubot[bot]