spin
spin copied to clipboard
Add counter telemetry
Fixes #2564
I think this captures the metrics of LLM and Key-Value store
Captured the LLM model and prompt information and kv stores get and set key information
Not sure if the key is accessible when querying on Prometheus/Grafana
cc: @calebschoepp
@me-diru is this intended to land in the factors branch, or should it be rebased off main? Currently, merging this would merge a whole bunch of unrelated factors stuff too.
@me-diru is this intended to land in the factors branch, or should it be rebased off main? Currently, merging this would merge a whole bunch of unrelated factors stuff too.
This should land on factors (or main once factors merges in there).
I changed the base to factors, and it should only reflect my code changes. Thanks for checking in @itowlson !
@calebschoepp I am not sure how to test the telemetry metrics in factors. For llm-compute, I think I am capturing it in the right place. However, when I run the build using
4318/v1/traces OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=h
ttp://localhost:4318/v1/metrics ../../../spin/target
/debug/spin up --runtime-config-file ../runtime-conf
ig.toml
it gives me an error of
Eror: unused runtime config key(s): llm_compute
I tried to run the factor_test.rs for the factor-llm, but it gave me another error on
error[E0277]: the trait bound `url::Url: Deserialize<'_>` is not satisfied
--> crates/factor-llm/src/spin.rs:91:17
|
91 | #[derive(Debug, serde::Deserialize)]
| ^^^^^^^^^^^^^^^^^^ the trait `Deserialize<'_>` is not implemented for `url::Url`
|
Which I think satisfies
On the other hand, for factor-key-value, I think it is utilizing the spin-key-value crate to do the set and get functions? So I guess the monotonic counters for key-value should suffice.
Would be great to have your input
@me-diru I still see 3 commits that aren't yours that you'll want to take out of this diff.
@me-diru
the trait
Deserialize<'_>is not implemented forurl::Url
You might need to enable the serde feature for the url crate. https://docs.rs/url/latest/url/#feature-serde
(This is a common pattern in Rust utility crates, where it's useful for people to be able to serialise the types, but they don't want to force a heavyweight serde dependency on people who just want to sling URL or times or whatever.)
You might need to enable the
serdefeature for theurlcrate. https://docs.rs/url/latest/url/#feature-serde
That did the trick, tests passed :D
I am just curious how the tests passed before, though 😅 In the current case of factor-llm, we don't have to deserialize the Url?
@me-diru It will work if any crate in the build turns the feature on. This can be cause surprises when you use your crate in a slightly different build context and the other crate that happened to make things work is no longer there and boom your code stops compiling.
This is a significant pain point for features but I gather there is not much that can be done.
Are you still having errors running it @me-diru? I would need to see the runtime config you're using to help more.
Are you still having errors running it @me-diru? I would need to see the runtime config you're using to help more.
Yes, it's still happening. When I run the same command with the latest spin cli release(2.7), it works fine.
anonymized runtime-config file
[llm_compute]
type = "remote_http"
url = "<URL>"
auth_token = "<AUTH-TOKEN>"
I checked with factors branch spin binary and the same error occurs. I don't think the metrics code is causing this one.
Maybe @lann could give more insight
:point_up: This hopefully fixed it.
@calebschoepp I think it's now capturing the metrics in the new factors code!
Sweet, @me-diru is this ready for a final review?
@calebschoepp Yes! Please review :D
Could someone from @fermyon/spin-core-maintainers enable CI and merge this?
CI didn't run as the base branch is still factors. Perhaps rebase this on main? Then CI should run.
Good catch, @me-diru you'll have to point this onto main.
The plan was to merge it to factors branch capturing metrics in the new codebase. @calebschoepp I wonder if the main branch was updated to point the PR there? 👀
I see all commits from factors branch are merged to main. I will do that, thanks! :D
I have excluded capturing key details in the metrics. However, I was still getting some weird linting issues. It would be great to get more context on them!
I have excluded capturing key details in the metrics. However, I was still getting some weird linting issues. It would be great to get more context on them!
Seems like there aren't any linting errors in CI. Are you just experiencing those linting errors locally? What are they?
Seems like there aren't any linting errors in CI. Are you just experiencing those linting errors locally? What are they?
I think so. This is what I got.
Interesting to see it not being reflected in CI though
@me-diru CI is currently locked to Rust 1.79. Is it possible you are on a more recent version of Rust with shiny new lints?
FWIW the lint warnings there have been fixed in https://github.com/fermyon/spin/pull/2866 so a rebase will get rid of those.
@itowlson I am using Rust 1.81! That makes sense. Thanks for chiming in!
It's good to go now. I will rebase and get it merged @rylev
Interesting to see most of my merge conflicts were version downgrades
There was a timeout error in all-integration-tests under Zig Setup. Rerunning it just to make sure.
Run goto-bus-stop/setup-zig@v2
AggregateError [ETIMEDOUT]:
at internalConnectMultiple (node:net:1117:1[8](https://github.com/fermyon/spin/actions/runs/11154357210/job/31003494593?pr=2741#step:4:9))
at internalConnectMultiple (node:net:1185:5)
at Timeout.internalConnectMultipleTimeout (node:net:1711:5)
at listOnTimeout (node:internal/timers:575:[11](https://github.com/fermyon/spin/actions/runs/11154357210/job/31003494593?pr=2741#step:4:12))
at process.processTimers (node:internal/timers:514:7)
```
@me-diru Would you mind pushing to the branch again to re-trigger CI?
@michelleN thanks for the bump! I re-run it and now looks like all checks are passing 😍
Please let me know if any other changes are required!