User description

Adds a new middleware that uses tiktoken to count AI message tokens (support OpenAI and Anthropic)

Description

Adds mw_llm_reporter.go middleware
Hooks after transforms and auth
The middleware is only loaded if the API is tagged with llm
LLMReporter middleware will
- decode outbound body to a base API message format used by both Anthropic and OpenAI (Gemini uses a different content structure)
- attempt to build a content blob from the message payload
- attempt to detect the model being used in the request
- use the tiktoken library to estimate the number of tokens, if it can't get a clear read on the model being used it will tag the count as estiamted
- Set three values into the context: ctx.LLMReport_Model, ctx.LLMReport_NumTokens, and ctx.LLMReport_Estimate.
These context values can later be used in the RecordHit function to store the data in an an alytics record (as this is part of Tyk Pump I did not extend the pump record)

Motivation and Context

We've had multiple requests for this kind of reporting and it is relatively easy to add to our analytics, it is also something our competitors already do and will allow us to have a matching feature.

How This Has Been Tested

Manual testing directly with dummy requests from OpenAI and Anthropic docs

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

[ ] I ensured that the documentation is up to date
[ ] I explained why this PR updates go.mod in detail with reasoning why it's required
[ ] I would like a code coverage CI quality gate exception and have explained why

PR Type

Enhancement, Dependencies

Description

Added new middleware LLMReport to track token counts for AI APIs using the tiktoken library.
Introduced constants and functions to handle LLM report data in the request context.
Updated API processing to include the new middleware.
Modified quickstart API configuration to support LLM reporting.
Added necessary dependencies for the new middleware.

Changes walkthrough 📝

Relevant files

Enhancement

ctx.go `Add constants for LLM report context values.` ctx/ctx.go Added constants for LLM report model, token count, and estimate.	+3/-0
api.go `Add functions to retrieve LLM report data from context.` gateway/api.go Added functions to get LLM report token count, model name, and estimate from request context.	+21/-0
api_loader.go `Enable LLMReport middleware in API processing.` gateway/api_loader.go Enabled LLMReport middleware in the API processing chain.	+1/-0
mw_llm_reporter.go `Implement LLMReport middleware for token counting.` gateway/mw_llm_reporter.go Added new middleware to decode request body, detect model, count tokens, and set context values. Implemented functions to decode body, detect model, and count tokens using tiktoken library.	+132/-0
quickstart.json `Update quickstart API configuration for LLM reporting.` apps/quickstart.json Added `llm` tag to the quickstart API configuration. Enabled tracking by setting `do_not_track` to false.	+2/-1

Dependencies

go.mod `Add dependencies for LLM reporting middleware.` go.mod Added `tiktoken-go` and `regexp2` dependencies.	+2/-0
go.sum `Update go.sum with new dependencies.` go.sum Updated to include checksums for new dependencies.	+4/-0

💡 PR-Agent usage: Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

Jun 11 '24 04:06 lonelycode

API Changes

--- prev.txt	2024-06-11 05:38:00.319166295 +0000
+++ current.txt	2024-06-11 05:37:57.343136072 +0000
@@ -6859,6 +6859,9 @@
 	GraphQLRequest
 	GraphQLIsWebSocketUpgrade
 	OASOperation
+	LLMReport_Model
+	LLMReport_NumRequestTokens
+	LLMReport_Estimate
 
 	// CacheOptions holds cache options required for cache writer middleware.
 	CacheOptions
@@ -8790,6 +8793,16 @@
 
 func (l *LDAPStorageHandler) SetRollingWindow(keyName string, per int64, val string, pipeline bool) (int, []interface{})
 
+type LLMReport struct {
+	*BaseMiddleware
+}
+
+func (sa *LLMReport) EnabledForSpec() bool
+
+func (sa *LLMReport) Name() string
+
+func (sa *LLMReport) ProcessRequest(w http.ResponseWriter, r *http.Request, _ interface{}) (error, int)
+
 type LogMessageEventHandler struct {
 	Gw *Gateway `json:"-"`
 	// Has unexported fields.

Jun 11 '24 04:06 github-actions[bot]

PR Reviewer Guide 🔍

⏱️ Estimated effort to review [1-5]	4
🧪 Relevant tests	No
🔒 Security concerns	No
⚡ Key issues to review	Possible Bug: The middleware name in `LLMReport` is incorrectly set to "StripAuth". This should be corrected to reflect the actual middleware purpose, such as "LLMReport".
	Error Handling: The `decodeBody` and `countTokens` methods could potentially return errors that are not handled in a way that differentiates between different types of errors (e.g., network vs. parsing errors). More granular error handling might be beneficial.
	Performance Concern: The `countTokens` method concatenates strings in a loop, which can be inefficient for large numbers of messages. Consider using a `strings.Builder` for more efficient string concatenation.

Jun 11 '24 04:06 github-actions[bot]

PR Code Suggestions ✨

Category	Suggestion	Score
Possible bug	Implement safe type assertions with error handling to prevent runtime panics Add error handling for type assertions when extracting values from the context to prevent runtime panics if the types do not match expected. gateway/api.go [3109-3124] `-return v.(int) -return v.(string) -return v.(bool) +if numTokens, ok := v.(int); ok { + return numTokens +} +return 0 +if modelName, ok := v.(string); ok { + return modelName +} +return "" + +if isEstimate, ok := v.(bool); ok { + return isEstimate +} +return false +` Suggestion importance[1-10]: 10 Why: This suggestion addresses a critical issue by adding error handling for type assertions, which prevents potential runtime panics. This is crucial for the robustness and stability of the application.	10
Possible bug	Add handling for missing `Content-Type` header to prevent potential nil pointer dereference Consider handling the case where the `Content-Type` header is not set at all, which could lead to a nil pointer dereference when calling `r.Header.Get("Content-Type")`. gateway/mw_llm_reporter.go [48-49] `-if r.Header.Get("Content-Type") != "application/json" { +contentType := r.Header.Get("Content-Type") +if contentType == "" \|\| contentType != "application/json" { return nil, fmt.Errorf("Content-Type is not application/json") }` Suggestion importance[1-10]: 9 Why: This suggestion addresses a potential bug that could lead to a nil pointer dereference, which is a significant issue. The improved code ensures that the `Content-Type` header is checked for both presence and correctness.	9
Enhancement	Change the middleware name to accurately reflect its purpose Use a more descriptive middleware name instead of "StripAuth" which seems unrelated to the actual functionality of LLM reporting. gateway/mw_llm_reporter.go [30] `-return "StripAuth" +return "LLMReport"` Suggestion importance[1-10]: 8 Why: Changing the middleware name to accurately reflect its purpose enhances code readability and maintainability. This is an important improvement, although not as critical as fixing bugs.	8
Maintainability	Use a constant for the default model name to improve maintainability Replace the hardcoded model name "gpt-3.5-turbo" with a constant to avoid magic strings and facilitate easier updates or configurations. gateway/mw_llm_reporter.go [72] `-model := "gpt-3.5-turbo" +const defaultModel = "gpt-3.5-turbo" +model := defaultModel` Suggestion importance[1-10]: 7 Why: Using a constant for the model name improves maintainability and readability of the code by avoiding magic strings. However, this is a minor improvement compared to fixing potential bugs.	7

Jun 11 '24 04:06 github-actions[bot]

:boom: CI tests failed :see_no_evil:

git-state

diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
index 305ee78..bff0c82 100644
--- a/gateway/mw_llm_reporter.go
+++ b/gateway/mw_llm_reporter.go
@@ -8,8 +8,9 @@ import (
 	"os"
 	"strings"
 
-	"github.com/TykTechnologies/tyk/ctx"
 	"github.com/pkoukk/tiktoken-go"
+
+	"github.com/TykTechnologies/tyk/ctx"
 )
 
 type msgObject struct {

Please look at the run or in the Checks tab.

Jun 11 '24 04:06 github-actions[bot]

:boom: CI tests failed :see_no_evil:

git-state

diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
index 8336ca2..413f9ce 100644
--- a/gateway/mw_llm_reporter.go
+++ b/gateway/mw_llm_reporter.go
@@ -8,8 +8,9 @@ import (
 	"os"
 	"strings"
 
-	"github.com/TykTechnologies/tyk/ctx"
 	"github.com/pkoukk/tiktoken-go"
+
+	"github.com/TykTechnologies/tyk/ctx"
 )
 
 type msgObject struct {

Please look at the run or in the Checks tab.

Jun 11 '24 05:06 github-actions[bot]

:boom: CI tests failed :see_no_evil:

git-state

diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
index 8336ca2..413f9ce 100644
--- a/gateway/mw_llm_reporter.go
+++ b/gateway/mw_llm_reporter.go
@@ -8,8 +8,9 @@ import (
 	"os"
 	"strings"
 
-	"github.com/TykTechnologies/tyk/ctx"
 	"github.com/pkoukk/tiktoken-go"
+
+	"github.com/TykTechnologies/tyk/ctx"
 )
 
 type msgObject struct {

Please look at the run or in the Checks tab.

Jun 11 '24 05:06 github-actions[bot]

@lonelycode all LLMs this days return amount of spent tokens for request and response, so maybe instead of calculating it before request, calculate it after response? And after update the rates?

Jun 13 '24 08:06 buger

LLM reporting Middleware that tracks token counts for AI APIs

User description

Description

Motivation and Context

How This Has Been Tested

Types of changes

Checklist

PR Type

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

PR Code Suggestions ✨

:boom: CI tests failed :see_no_evil:

git-state

:boom: CI tests failed :see_no_evil:

git-state

:boom: CI tests failed :see_no_evil:

git-state