pr-agent icon indicating copy to clipboard operation
pr-agent copied to clipboard

Secret Masking

Open akshaypatidar1999 opened this issue 8 months ago • 4 comments

Feature request

The secrets should be masked before sending data to AI models for security

Motivation

Some repositories may contain sensitive information which should not be shared with AI models so these should be masked before sending data to ai model

akshaypatidar1999 avatar Mar 24 '25 08:03 akshaypatidar1999

:bulb: Auto-generated documentation-based answer:

This feature request is already addressed in PR-Agent/Qodo Merge through the "ignore" configuration options that allow you to exclude specific files or patterns from analysis.

You can ignore files or folders containing sensitive information using either glob patterns or regex patterns:

  1. Using glob patterns:
[ignore]
glob = ['**/secrets/*', '*.env']
  1. Using regex patterns:
[ignore]
regex = ['.*secret.*', '.*password.*']

These configurations can be set in your configuration file or passed as command-line arguments. For example:

/review --ignore.glob="['*.env']"

This ensures that sensitive files or files matching specific patterns are excluded from the data sent to AI models, effectively masking secrets from the analysis process.

Relevant Sources:

  • https://qodo-merge-docs.qodo.ai//usage-guide/additional_configurations#ignoring-files-from-analysis

Thanks while this helps can we look into integrating tools like llm guard as well https://github.com/protectai/llm-guard?tab=readme-ov-file

akshaypatidar1999 avatar Mar 24 '25 08:03 akshaypatidar1999

Hi @akshaypatidar1999

Sensitive data should never be committed to git. Never.

For 'chat-gpt-in-ide' tools, I can understand why a masking feature might be needed - you are working in an intermediate state, and you might have local uncommitted files.

But for PRs, secrets should not appear. if they do, PR-Agent should give an alert. If we start masking the PR content to the AI, PR-Agent will fail to alert on that, as it should.

In addition, most AI providers today support zero data retention, so the harm of sending "sensitive" data (on the very rare unusual cases it might occur) is low.

mrT23 avatar Mar 29 '25 06:03 mrT23

Hi @mrT23,

I’ve encountered an issue with the --ignore-glob CLI argument when trying to ignore multiple file patterns. While it works correctly with a single pattern (e.g., --ignore-glob="['.properties']"), it fails to ignore files when multiple patterns are provided in a list (e.g., --ignore-glob="['.env', '*.properties']").

vishnuprajapati avatar Apr 09 '25 13:04 vishnuprajapati