ASVS
ASVS copied to clipboard
Request for Clarification/Refinement: V1.1 "not stored in an encoded or escaped" Data Storage Principle
Please refer to this disucssion reference https://github.com/OWASP/ASVS/discussions/3184 , further input I am writing here as I think the current requirement may reqire some clarity and practical alignment.
Please refer the below code reference of Discourse & Ghost where they are storing encoded/sanitized data into the database, depending on architectural needs. There are lot of other popular opensource framework which are also storing the data, if needed i will share accordingly.
- Discourse : Code Reference -- Stores both raw and HTML (cooked) versions of user posts.
- Ghost : Code Reference -- Stores Sanitized/Encoded data.
Here is the brief summary of the attached link of the github discussion:
Quoting and responding to a few points raised by @elarlang in his response.
it is not possible to encode data for HTML before saving it to the database
Encoding before storage is common in edge cases and application-dependent, this totally depends on the application requirement. The above shared code example is enough to think on contradicting the absolute stance.
It ruins the integrity of the data - e.g., it was not the value that the user entered
Data Integrity: When HTML sanitization happens, the data is intentionally changed by removing malicious parts (like
It assumes that the only output encoding is HTML, but if it is to be used it in JSON, CSV, displayed as just text, or whatever other format, it is already in the incorrect format
if certain fields are guaranteed to only ever be displayed in a single, fixed context like HTML, would it be acceptable — from an ASVS perspective — to encode once during write-time?
The problem to solve here is to use the correct caching mechanism.
The encoded data should be stored into the cache, if the data should be into the cache then would like to know the reason why we are stating that "never store encoded data". Does V1.1’s “stored” include caches, or is it database-specific?
The clarity and usability of the guidance are lacking, team should consider/think the following points:
- Principles need context, blanket statements (like “never store encoded data”) can lead to confusion or overcorrection.
- Clear usecases help avoid mistakes. When developers see good and bad usecases, they better understand the reason behind a rule—and they’re more likely to apply it the right way.
- Clarify whether “stored” in the current rule applies to database only, or also includes caches.
- The “not stored in an encoded or escaped” in the statement is too absolute, as there are niche cases where storing encoded or transformed values is justified, it is good to think on Reinforce "storing raw" as the default and recommended best practice.