Adding keyword search
Description
Please include a summary of the change and the issue it solves.
Fixes #issue_number
Type of change
(Please delete options that are not relevant)
- [ ] 🐛 Bug fix (non-breaking change which fixes an issue)
- [ ] ⚡ New feature (non-breaking change which adds functionality)
- [ ] 📢 Breaking change (fix or feature that would cause existing functionality not to work as expected)
- [ ] 📄 This change requires a documentation update
Verification Process
To ensure the changes are working as expected:
- [ ] Test Location: Specify the URL or path for testing.
- [ ] Verification Steps: Outline the steps or queries needed to validate the change. Include any data, configurations, or actions required to reproduce or see the new functionality.
Additional Media:
- [ ] I have attached a brief loom video or screenshots showcasing the new functionality or change.
Checklist:
- [ ] My code follows the style guidelines(PEP 8) of MindsDB.
- [ ] I have appropriately commented on my code, especially in complex areas.
- [ ] Necessary documentation updates are either made or tracked in issues.
- [ ] Relevant unit and integration tests are updated or added.
Review Summary
🏷️ Draft Comments (4)
Skipped posting 4 drafted comments based on your review threshold. Feel free to update them here.
mindsdb/integrations/handlers/pgvector_handler/pgvector_handler.py (2)
0146-0160:add_full_text_indexdoes not check for existing GIN indexes before creation, causing unnecessary index creation attempts and potential performance degradation on large tables.Scores:
- Production Impact: 2
- Fix Specificity: 3
- Urgency Impact: 2
- Total Score: 7
Reason for filtering: The commitable suggestion is technically flawed and would break the code. The suggestion removes 'IF NOT EXISTS' from the CREATE INDEX statement but then tries to check for existing indexes manually. However, the manual check queries pg_indexes but doesn't handle the case where the table itself doesn't exist yet, which would cause the query to fail. Additionally, the existing code already uses 'CREATE INDEX IF NOT EXISTS' which is the standard PostgreSQL way to handle this scenario safely. The suggestion introduces unnecessary complexity and potential failure points.
Analysis: While the concern about performance is valid, the suggested fix is problematic and the current code already handles the scenario appropriately with IF NOT EXISTS clause.
0155-0158:add_full_text_indexconstructs SQL queries using unsanitizedtable_nameandcolumn_name, allowing SQL injection and unauthorized command execution.Scores:
- Production Impact: 3
- Fix Specificity: 2
- Urgency Impact: 3
- Total Score: 8
Reason for filtering: The commitable suggestion uses
.isidentifier()which is inappropriate for SQL identifiers. Python's.isidentifier()method checks for valid Python variable names, not SQL identifiers. SQL identifiers can contain characters that aren't valid Python identifiers (like hyphens, spaces when quoted, etc.). This would incorrectly reject valid SQL table/column names. Additionally, thetable_namehas already been processed throughself._check_table()which applies proper namespacing and validation. The security concern is valid but the proposed fix is technically incorrect and would break legitimate use cases.Analysis: Security concern is legitimate but the proposed fix using Python's .isidentifier() is inappropriate for SQL context and would break valid SQL identifiers. A proper SQL injection fix would require different validation or parameterized queries.
mindsdb/interfaces/knowledge_base/controller.py (2)
1012-1013:add_full_text_indexis called unconditionally ifkeyword_search_enabledis True, but not all vector store handlers may implement this method, leading to a possible AttributeError at runtime.Scores:
- Production Impact: 4
- Fix Specificity: 5
- Urgency Impact: 3
- Total Score: 12
Reason for filtering: This is a valid technical issue. The code at lines 1012-1013 calls
add_full_text_indexwithout checking if the method exists on the vector store handler. This could cause AttributeError at runtime if a handler doesn't implement this method. The suggested fix usinghasattris appropriate and safe.Analysis: High production impact because AttributeError would crash the knowledge base creation process. The fix is very specific and immediately applicable. Medium urgency since it only occurs when keyword_search_enabled=True.
1012-1013:keyword_search_enabledparameter is not validated or restricted, allowing an attacker to triggeradd_full_text_indexon arbitrary tables, potentially leading to privilege escalation or denial of service if abused.Scores:
- Production Impact: 1
- Fix Specificity: 1
- Urgency Impact: 1
- Total Score: 3
Reason for filtering: This comment makes several incorrect assumptions: 1) It assumes
self.session.user.is_adminexists without evidence in the codebase, 2) It assumes this is a security vulnerability without proper context about the application's security model, 3) The suggested fix would likely break existing functionality by adding authorization checks that may not be appropriate at this layer, 4) Thevector_table_name.isidentifier()check is overly restrictive and could break legitimate table names.Analysis: The commitable suggestion would likely break the code by referencing non-existent attributes and adding inappropriate authorization logic. The security concern is speculative without proper context about the application's threat model and existing security controls.