semantic-conventions
                                
                                 semantic-conventions copied to clipboard
                                
                                    semantic-conventions copied to clipboard
                            
                            
                            
                        LLM: Standardized fields for LLM Security and protection [Discussion]
Area(s)
area:gen-ai , llm
Is your change request related to a problem? Please describe.
NOTE: narrowed down the list of fields in https://github.com/open-telemetry/semantic-conventions/issues/1034
To prevent threats to LLM systems, such as misuse, and to log content filters, proposing standardized fields for the purpose of secure and safe LLM usage. Based on frameworks such as OWASP’s LLM Top 10 and MITRE ATLAS.
An example is that a user may be using various LLM vendors or their own deployments, and wish to log all of them in a standardized manner. Our team has published a blog proposing standardized fields for LLM Security, led by @Mikaayenson.
Initially, we wanted to add these fields to ECS (Elastic Common Schema), but since the convergence/donation of ECS to OpenTelemetry we're following the guidelines to propose changes to OTel.
Additional example of our work in LLM Security which leverage fields like the ones proposed: blog on implementing LLM Security via proxy.
Describe the solution you'd like
The below are the fields that we used in our work in standardized fields for LLM Security across vendors/deployments etc.
The same list is also available as a gist
| Category | Field | Type | Description | Existing OTel SemConv (as of May 6, 2024) | 
|---|---|---|---|---|
| General LLM Interaction Fields | gen_ai.prompt | text | The full text of the user's request to the gen_ai. | |
| gen_ai.usage.prompt_tokens | integer | Number of tokens in the user's request. | gen_ai.usage.prompt_tokens | |
| gen_ai.completion | text | The full text of the LLM's response. | ||
| gen_ai.usage.completion_tokens | integer | Number of tokens in the LLM's response. | gen_ai.usage.completion_tokens | |
| gen_ai.system | keyword | Name of the LLM foundation model vendor. | gen_ai.system | |
| gen_ai.user.id | keyword | Unique identifier for the user. | ||
| gen_ai.request.id | keyword | Unique identifier for the LLM request. | ||
| gen_ai.response.id | keyword | Unique identifier for the LLM response. | gen_ai.response.id | |
| gen_ai.response.model | ||||
| gen_ai.response.error_code | keyword | Error code returned in the LLM response. | ||
| gen_ai.response.finish_reasons | keyword array | Reason the LLM response stopped. | gen_ai.response.finish_reasons | |
| gen_ai.request.timestamp | date | Timestamp when the request was made. | ||
| gen_ai.response.timestamp | date | Timestamp when the response was received. | ||
| gen_ai.request.model.id | keyword | ID of the LLM model a request is being made to. | gen_ai.request.model | |
| gen_ai.request.max_tokens | integer | Maximum number of tokens the LLM generates for a request. | gen_ai.request.max_tokens | |
| gen_ai.request.temperature | float | Temperature setting for the LLM request. | gen_ai.request.temperature | |
| gen_ai.request.top_k | float | The top_k sampling setting for the LLM request. | ||
| gen_ai.request.top_p | float | The top_p sampling setting for the LLM request. | gen_ai.request.top_p | |
| gen_ai.request.model_version | keyword | Version of the LLM model used to generate the response. | ||
| gen_ai.request.model.role | keyword | Role of the LLM model in the interaction. | ||
| gen_ai.request.model.type | keyword | Type of LLM model. | ||
| gen_ai.request.model.description | keyword | Description of the LLM model. | ||
| gen_ai.request.model.instructions | text | Custom instructions for the LLM model. | ||
| Text Quality and Relevance Metric Fields | gen_ai.text.readability_score | float | Measures the readability level of the text. | |
| gen_ai.text.complexity_score | float | Evaluates the complexity of the text. | ||
| gen_ai.text.similarity_score | float | Measures the similarity between the prompt and response. | ||
| Security Metric Fields | gen_ai.security.regex_pattern_count | integer | Counts occurrences of strings matching user-defined regex patterns. | |
| gen_ai.security.jailbreak_score | float | Measures similarity to known jailbreak attempts. | ||
| gen_ai.security.prompt_injection_score | float | Measures similarity to known prompt injection attacks. | ||
| gen_ai.security.hallucination_consistency | float | Consistency check between multiple responses. | ||
| gen_ai.security.refusal_score | float | Measures similarity to known LLM refusal responses. | ||
| Policy Enforcement Fields | gen_ai.policy.name | keyword | Name of the specific policy that was triggered. | |
| gen_ai.policy.violation | boolean | Specifies if a security policy was violated. | ||
| gen_ai.policy.action | keyword | Action taken due to a policy violation, such as blocking, alerting, or modifying the content. | ||
| gen_ai.policy.match_detail | nested | Details about what specifically triggered the policy, including matched words, phrases, or patterns. | ||
| gen_ai.policy.confidence | float | Confidence level in the policy match that triggered the action, quantifying how closely the identified content matched the policy criteria. | ||
| Threat Analysis Fields | gen_ai.threat.risk_score | float | Numerical score indicating the potential risk associated with the response. | |
| gen_ai.threat.type | keyword | Type of threat detected in the LLM interaction. | ||
| gen_ai.threat.detected | boolean | Whether a security threat was detected. | ||
| gen_ai.threat.category | keyword | Category of the detected security threat. | ||
| gen_ai.threat.description | text | Description of the detected security threat. | ||
| gen_ai.threat.action | keyword | Recommended action to mitigate the detected security threat. | ||
| gen_ai.threat.source | keyword | Source of the detected security threat. | ||
| gen_ai.threat.signature | keyword | Signature of the detected security threat. | ||
| gen_ai.threat.yara_matches | nested | Stores results from YARA scans including rule matches and categories. | ||
| Compliance Fields | gen_ai.compliance.violation_detected | boolean | Indicates if any compliance violation was detected during the interaction. | |
| gen_ai.compliance.violation_code | keyword | Code identifying the specific compliance rule that was violated. | ||
| gen_ai.compliance.response_triggered | keyword array | Lists compliance-related filters that were triggered during the processing of the response, such as data privacy filters or regulatory compliance checks. | ||
| gen_ai.compliance.request_triggered | keyword array | Lists compliance-related filters that were triggered during the processing of the request, such as data privacy filters or regulatory compliance checks. | ||
| OWASP Top Ten Specific Fields | gen_ai.owasp.id | keyword | Identifier for the OWASP risk addressed. | |
| gen_ai.owasp.description | text | Description of the OWASP risk triggered. | ||
| Security Tools Analysis Fields | gen_ai.analysis.tool_names | keyword array | Name of the security or analysis tools used. | |
| gen_ai.analysis.function | keyword | Name of the security or analysis function used. | ||
| gen_ai.analysis.findings | nested | Detailed findings from security tools. | ||
| gen_ai.analysis.action_recommended | keyword | Recommended actions based on the analysis. | ||
| Sentiment and Toxicity Analysis Fields | gen_ai.sentiment.score | float | Sentiment analysis score. | |
| gen_ai.sentiment.toxicity_score | float | Toxicity analysis score. | ||
| gen_ai.sentiment.content_inappropriate | boolean | Whether the content was flagged as inappropriate or sensitive. | ||
| gen_ai.sentiment.content_categories | keyword array | Categories of content identified as sensitive or requiring moderation. | ||
| Performance Metric Fields | gen_ai.performance.response_time | long | Time taken by the LLM to generate a response in milliseconds. | |
| gen_ai.performance.request_size | long | Size of the request payload in bytes. | ||
| gen_ai.performance.start_response_time | long | Time taken by the LLM to send first response byte in milliseconds. | ||
| gen_ai.performance.response_size | long | Size of the response payload in bytes. | 
Describe alternatives you've considered
Alternatives are to submit these fields only to ECS, but since the donation of ECS, the standard is to discuss and propose to OTel.
Additional context
We'd like to open up a discussion; we are happy to discuss the fields, take any thoughts and suggestions!
This list illustrates the significant scope of monitoring Gen AI applications! Here's my feedback:
One challenge is the number of topics that require focused discussions for incremental progress. For instance, general attributes necessary for broad application could be discussed separately. Some attributes may be vendor-specific, which is why they have not been included yet, but are planned for a future PR. Other discussions, such as those surrounding model versions, were explored earlier and could be revisited if broken into their own issue or PR.
Then, consider the rest of the categories in two ways. First, examine if there's a way to generalize some of these attributes to avoid having a distinct set for each security category (it seems maybe you've already done this analysis). Second, break each of these categories into separate issues or PRs as well, particularly the general evaluation attributes, which may already have an existing issue.
Each of these smaller issues or PRs will focus on detailed discussions, prototyping, validation and follow the lifecycle for semantic conventions. The smaller increments will move along faster and less likely to have one category get stuck by a debate in another category.
How would you break it down? How would you prioritize the subtopics so the most important land in the near term?
Thank you for putting the list together and all the source material Susan @susan-shu-c !!
Do all of the attributes that end in "score" have a single consistent and agreed upon definition of how such scores are computed?
@drewby thank you for the detailed response! I am going through some open PRs and do spot some fields that would be introduced, for example this PR with .duration.
Great suggestion to split up the issues into smaller categories; we've been discussing priorities hence the slower response, and will update this issue accordingly + create new smaller ones.
@piotrm0 Good question, if the users are creating detection rules on top of the other fields, then they'd be responsible to determine the risk score and populate the .scores fields. Otherwise if they use prebuilt rules from vendors, then the scores  will have been prepopulated by the vendor according to the vendor's recommendations. Here's an example of a detection rule @mikaayenson built with a score: link
We should defer to the vendors; Azure / GCP / AWS will have their own definitions: an example is the AWS Bedrock Guardrails which define None | Low | Medium | High behind the scenes. So the user can use those vendor definitions to map it to some numerical definition for their own case.
Hi all, I've created a much narrowed-down list of fields based only on detection rules that we've created. Many of the fields used in said detection rules already exist in the SemConv, so I only include the ones not already existing or in this PR: https://github.com/open-telemetry/semantic-conventions/pull/955
Please let us know what you think!
- https://github.com/open-telemetry/semantic-conventions/issues/1034
Closing: Superseded by https://github.com/open-telemetry/semantic-conventions/issues/1034