spin icon indicating copy to clipboard operation
spin copied to clipboard

Add counter telemetry

Open me-diru opened this issue 1 year ago • 26 comments

Fixes #2564

I think this captures the metrics of LLM and Key-Value store

me-diru avatar Aug 21 '24 19:08 me-diru

Captured the LLM model and prompt information and kv stores get and set key information

Not sure if the key is accessible when querying on Prometheus/Grafana Screenshot from 2024-08-21 11-50-55

cc: @calebschoepp

me-diru avatar Aug 21 '24 19:08 me-diru

@me-diru is this intended to land in the factors branch, or should it be rebased off main? Currently, merging this would merge a whole bunch of unrelated factors stuff too.

itowlson avatar Aug 22 '24 21:08 itowlson

@me-diru is this intended to land in the factors branch, or should it be rebased off main? Currently, merging this would merge a whole bunch of unrelated factors stuff too.

This should land on factors (or main once factors merges in there).

calebschoepp avatar Aug 22 '24 21:08 calebschoepp

I changed the base to factors, and it should only reflect my code changes. Thanks for checking in @itowlson !

me-diru avatar Aug 22 '24 22:08 me-diru

@calebschoepp I am not sure how to test the telemetry metrics in factors. For llm-compute, I think I am capturing it in the right place. However, when I run the build using

4318/v1/traces OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=h
ttp://localhost:4318/v1/metrics ../../../spin/target
/debug/spin up --runtime-config-file ../runtime-conf
ig.toml

it gives me an error of

Eror: unused runtime config key(s): llm_compute

I tried to run the factor_test.rs for the factor-llm, but it gave me another error on

error[E0277]: the trait bound `url::Url: Deserialize<'_>` is not satisfied
  --> crates/factor-llm/src/spin.rs:91:17
   |
91 | #[derive(Debug, serde::Deserialize)]
   |                 ^^^^^^^^^^^^^^^^^^ the trait `Deserialize<'_>` is not implemented for `url::Url`
   |

Which I think satisfies

On the other hand, for factor-key-value, I think it is utilizing the spin-key-value crate to do the set and get functions? So I guess the monotonic counters for key-value should suffice.

Would be great to have your input

me-diru avatar Aug 22 '24 22:08 me-diru

@me-diru I still see 3 commits that aren't yours that you'll want to take out of this diff.

calebschoepp avatar Aug 22 '24 22:08 calebschoepp

@me-diru

the trait Deserialize<'_> is not implemented for url::Url

You might need to enable the serde feature for the url crate. https://docs.rs/url/latest/url/#feature-serde

(This is a common pattern in Rust utility crates, where it's useful for people to be able to serialise the types, but they don't want to force a heavyweight serde dependency on people who just want to sling URL or times or whatever.)

itowlson avatar Aug 22 '24 22:08 itowlson

You might need to enable the serde feature for the url crate. https://docs.rs/url/latest/url/#feature-serde

That did the trick, tests passed :D

I am just curious how the tests passed before, though 😅 In the current case of factor-llm, we don't have to deserialize the Url?

me-diru avatar Aug 22 '24 22:08 me-diru

@me-diru It will work if any crate in the build turns the feature on. This can be cause surprises when you use your crate in a slightly different build context and the other crate that happened to make things work is no longer there and boom your code stops compiling.

This is a significant pain point for features but I gather there is not much that can be done.

itowlson avatar Aug 22 '24 22:08 itowlson

Are you still having errors running it @me-diru? I would need to see the runtime config you're using to help more.

calebschoepp avatar Aug 23 '24 15:08 calebschoepp

Are you still having errors running it @me-diru? I would need to see the runtime config you're using to help more.

Yes, it's still happening. When I run the same command with the latest spin cli release(2.7), it works fine.

anonymized runtime-config file

[llm_compute]
type = "remote_http"
url = "<URL>"
auth_token = "<AUTH-TOKEN>"

I checked with factors branch spin binary and the same error occurs. I don't think the metrics code is causing this one.

Maybe @lann could give more insight

me-diru avatar Aug 23 '24 17:08 me-diru

:point_up: This hopefully fixed it.

lann avatar Aug 26 '24 21:08 lann

@calebschoepp I think it's now capturing the metrics in the new factors code! image

me-diru avatar Sep 05 '24 21:09 me-diru

Sweet, @me-diru is this ready for a final review?

calebschoepp avatar Sep 06 '24 18:09 calebschoepp

@calebschoepp Yes! Please review :D

me-diru avatar Sep 21 '24 22:09 me-diru

Could someone from @fermyon/spin-core-maintainers enable CI and merge this?

CI didn't run as the base branch is still factors. Perhaps rebase this on main? Then CI should run.

vdice avatar Sep 23 '24 19:09 vdice

Good catch, @me-diru you'll have to point this onto main.

calebschoepp avatar Sep 23 '24 20:09 calebschoepp

The plan was to merge it to factors branch capturing metrics in the new codebase. @calebschoepp I wonder if the main branch was updated to point the PR there? 👀

I see all commits from factors branch are merged to main. I will do that, thanks! :D

me-diru avatar Sep 23 '24 23:09 me-diru

I have excluded capturing key details in the metrics. However, I was still getting some weird linting issues. It would be great to get more context on them!

me-diru avatar Sep 24 '24 00:09 me-diru

I have excluded capturing key details in the metrics. However, I was still getting some weird linting issues. It would be great to get more context on them!

Seems like there aren't any linting errors in CI. Are you just experiencing those linting errors locally? What are they?

calebschoepp avatar Sep 24 '24 14:09 calebschoepp

Seems like there aren't any linting errors in CI. Are you just experiencing those linting errors locally? What are they?

I think so. This is what I got.

Screenshot from 2024-09-23 17-15-45

Interesting to see it not being reflected in CI though

me-diru avatar Sep 25 '24 21:09 me-diru

@me-diru CI is currently locked to Rust 1.79. Is it possible you are on a more recent version of Rust with shiny new lints?

itowlson avatar Sep 25 '24 21:09 itowlson

FWIW the lint warnings there have been fixed in https://github.com/fermyon/spin/pull/2866 so a rebase will get rid of those.

rylev avatar Sep 26 '24 12:09 rylev

@itowlson I am using Rust 1.81! That makes sense. Thanks for chiming in!

It's good to go now. I will rebase and get it merged @rylev

me-diru avatar Oct 02 '24 06:10 me-diru

Interesting to see most of my merge conflicts were version downgrades

me-diru avatar Oct 02 '24 06:10 me-diru

There was a timeout error in all-integration-tests under Zig Setup. Rerunning it just to make sure.

Run goto-bus-stop/setup-zig@v2
AggregateError [ETIMEDOUT]: 
    at internalConnectMultiple (node:net:1117:1[8](https://github.com/fermyon/spin/actions/runs/11154357210/job/31003494593?pr=2741#step:4:9))
    at internalConnectMultiple (node:net:1185:5)
    at Timeout.internalConnectMultipleTimeout (node:net:1711:5)
    at listOnTimeout (node:internal/timers:575:[11](https://github.com/fermyon/spin/actions/runs/11154357210/job/31003494593?pr=2741#step:4:12))
    at process.processTimers (node:internal/timers:514:7)
    ```

me-diru avatar Oct 03 '24 08:10 me-diru

@me-diru Would you mind pushing to the branch again to re-trigger CI?

michelleN avatar Dec 02 '24 15:12 michelleN

@michelleN thanks for the bump! I re-run it and now looks like all checks are passing 😍

Please let me know if any other changes are required!

me-diru avatar Dec 02 '24 20:12 me-diru