LLM reporting Middleware that tracks token counts for AI APIs
User description
Adds a new middleware that uses tiktoken to count AI message tokens (support OpenAI and Anthropic)
Description
- Adds
mw_llm_reporter.gomiddleware - Hooks after transforms and auth
- The middleware is only loaded if the API is tagged with
llm -
LLMReportermiddleware will- decode outbound body to a base API message format used by both Anthropic and OpenAI (Gemini uses a different content structure)
- attempt to build a content blob from the message payload
- attempt to detect the model being used in the request
- use the tiktoken library to estimate the number of tokens, if it can't get a clear read on the model being used it will tag the count as estiamted
- Set three values into the context:
ctx.LLMReport_Model,ctx.LLMReport_NumTokens, andctx.LLMReport_Estimate.
- These context values can later be used in the
RecordHitfunction to store the data in an an alytics record (as this is part of Tyk Pump I did not extend the pump record)
Motivation and Context
We've had multiple requests for this kind of reporting and it is relatively easy to add to our analytics, it is also something our competitors already do and will allow us to have a matching feature.
How This Has Been Tested
- Manual testing directly with dummy requests from OpenAI and Anthropic docs
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Refactoring or add test (improvements in base code or adds test coverage to functionality)
Checklist
- [ ] I ensured that the documentation is up to date
- [ ] I explained why this PR updates go.mod in detail with reasoning why it's required
- [ ] I would like a code coverage CI quality gate exception and have explained why
PR Type
Enhancement, Dependencies
Description
- Added new middleware
LLMReportto track token counts for AI APIs using the tiktoken library. - Introduced constants and functions to handle LLM report data in the request context.
- Updated API processing to include the new middleware.
- Modified quickstart API configuration to support LLM reporting.
- Added necessary dependencies for the new middleware.
Changes walkthrough 📝
| Relevant files | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Enhancement |
| ||||||||||
| Dependencies |
|
💡 PR-Agent usage: Comment
/helpon the PR to get a list of all available PR-Agent tools and their descriptions
API Changes
--- prev.txt 2024-06-11 05:38:00.319166295 +0000
+++ current.txt 2024-06-11 05:37:57.343136072 +0000
@@ -6859,6 +6859,9 @@
GraphQLRequest
GraphQLIsWebSocketUpgrade
OASOperation
+ LLMReport_Model
+ LLMReport_NumRequestTokens
+ LLMReport_Estimate
// CacheOptions holds cache options required for cache writer middleware.
CacheOptions
@@ -8790,6 +8793,16 @@
func (l *LDAPStorageHandler) SetRollingWindow(keyName string, per int64, val string, pipeline bool) (int, []interface{})
+type LLMReport struct {
+ *BaseMiddleware
+}
+
+func (sa *LLMReport) EnabledForSpec() bool
+
+func (sa *LLMReport) Name() string
+
+func (sa *LLMReport) ProcessRequest(w http.ResponseWriter, r *http.Request, _ interface{}) (error, int)
+
type LogMessageEventHandler struct {
Gw *Gateway `json:"-"`
// Has unexported fields.
PR Reviewer Guide 🔍
| ⏱️ Estimated effort to review [1-5] |
4 |
| 🧪 Relevant tests |
No |
| 🔒 Security concerns |
No |
| ⚡ Key issues to review |
Possible Bug: |
|
Error Handling: | |
|
Performance Concern: |
PR Code Suggestions ✨
| Category | Suggestion | Score |
| Possible bug |
Implement safe type assertions with error handling to prevent runtime panicsAdd error handling for type assertions when extracting values from the context to prevent
Suggestion importance[1-10]: 10Why: This suggestion addresses a critical issue by adding error handling for type assertions, which prevents potential runtime panics. This is crucial for the robustness and stability of the application. | 10 |
Add handling for missing
| 9 | |
| Enhancement |
Change the middleware name to accurately reflect its purposeUse a more descriptive middleware name instead of "StripAuth" which seems unrelated to the gateway/mw_llm_reporter.go [30]
Suggestion importance[1-10]: 8Why: Changing the middleware name to accurately reflect its purpose enhances code readability and maintainability. This is an important improvement, although not as critical as fixing bugs. | 8 |
| Maintainability |
Use a constant for the default model name to improve maintainabilityReplace the hardcoded model name "gpt-3.5-turbo" with a constant to avoid magic strings gateway/mw_llm_reporter.go [72]
Suggestion importance[1-10]: 7Why: Using a constant for the model name improves maintainability and readability of the code by avoiding magic strings. However, this is a minor improvement compared to fixing potential bugs. | 7 |
:boom: CI tests failed :see_no_evil:
git-state
diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
index 305ee78..bff0c82 100644
--- a/gateway/mw_llm_reporter.go
+++ b/gateway/mw_llm_reporter.go
@@ -8,8 +8,9 @@ import (
"os"
"strings"
- "github.com/TykTechnologies/tyk/ctx"
"github.com/pkoukk/tiktoken-go"
+
+ "github.com/TykTechnologies/tyk/ctx"
)
type msgObject struct {
Please look at the run or in the Checks tab.
:boom: CI tests failed :see_no_evil:
git-state
diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
index 8336ca2..413f9ce 100644
--- a/gateway/mw_llm_reporter.go
+++ b/gateway/mw_llm_reporter.go
@@ -8,8 +8,9 @@ import (
"os"
"strings"
- "github.com/TykTechnologies/tyk/ctx"
"github.com/pkoukk/tiktoken-go"
+
+ "github.com/TykTechnologies/tyk/ctx"
)
type msgObject struct {
Please look at the run or in the Checks tab.
:boom: CI tests failed :see_no_evil:
git-state
diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
index 8336ca2..413f9ce 100644
--- a/gateway/mw_llm_reporter.go
+++ b/gateway/mw_llm_reporter.go
@@ -8,8 +8,9 @@ import (
"os"
"strings"
- "github.com/TykTechnologies/tyk/ctx"
"github.com/pkoukk/tiktoken-go"
+
+ "github.com/TykTechnologies/tyk/ctx"
)
type msgObject struct {
Please look at the run or in the Checks tab.
@lonelycode all LLMs this days return amount of spent tokens for request and response, so maybe instead of calculating it before request, calculate it after response? And after update the rates?