Specific asset tables
Detect which specific tables were accessed from a ducklake/datatable/db resource when parsing assets.
Deploying windmill with
Cloudflare Pages
| Latest commit: |
c694d49
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://e6677f37.windmill.pages.dev |
| Branch Preview URL: | https://di-specific-asset-tables-2.windmill.pages.dev |
Claude finished @diegoimbert's task in 3m 43s —— View job
Code Review Complete ✅
This PR introduces specific table tracking for database assets (Ducklake, DataTable, and database resources), enabling Windmill to track not just which databases are accessed, but which specific tables within those databases. This is a significant enhancement to the asset tracking system.
Summary of Changes
Backend Changes
- Asset Parsers Enhanced: SQL, Python, and TypeScript parsers now detect table-level access patterns
- Access Type Tracking: Distinguishes between Read (R), Write (W), and Read-Write (RW) operations
- Database Support: Handles
ATTACH,USE, and dot-notation syntax (e.g.,db.table_name) - Path Format: Assets now use hierarchical paths like
ducklake_name/table_nameoru/user/pg_db/table_name
Frontend Changes
- UI Updates: Components updated to display and manage specific tables
- DB Manager: Opens directly to specific tables when clicked from asset lists
- Asset Display: Improved icons and formatting for table-specific assets
- Path Parsing: Logic added to extract base database and specific table from asset paths
Code Quality Assessment
✅ Strengths
- Comprehensive Testing: All three parsers (SQL, Python, TypeScript) include extensive unit tests covering edge cases
- Consistent Design: Similar patterns used across all three language parsers
- Type Safety: Strong typing in both Rust backend and TypeScript frontend
- Backward Compatibility: Handles assets without specific tables (falls back to database-level tracking)
- Access Type Merging: Smart logic to combine R/W access types when same table accessed multiple times
⚠️ Areas for Improvement
I've left inline comments on specific issues. Key concerns:
1. Path Validation & Security
- No validation that table names don't contain
/characters, which could create malformed paths - String interpolation in SQL construction (Python/TS parsers) should be reviewed for edge cases
- Consider sanitizing user input from parsed code before using in paths
2. Access Type Merging Logic (Priority: Medium)
The current logic in asset_parser.rs:46-52 treats (Some(R), None) → None. This loses information. Consider:
(None, Some(x)) | (Some(x), None) => Some(x), // Preserve known access
3. SQL Parser Stack Management (Priority: High)
The Query statement pushes R twice (lines 210 and 344), which could cause incorrect access type detection in nested queries. This needs verification.
4. Frontend Path Parsing
Multiple locations split asset.path by / without validating the expected format:
ExploreAssetButton.svelte:61-62- assumes 3+ segments for resources- Similar patterns in ducklake/datatable handlers
- Should add defensive validation or use a helper function
5. Missing Asset Types in Frontend
lib.ts:parseAssetFromString() doesn't handle ducklake:// or datatable:// prefixes, though these exist in the backend parser.
6. Performance Considerations
- TypeScript parser clones entire
var_identifiersHashMap on every block scope entry - API query uses subqueries that could be optimized with JOINs for large datasets
- Consider adding database indexes if not present
Security Review
✅ No Critical Issues Found
- Asset paths are validated through
parse_asset_syntax() - Database queries use parameterized queries (sqlx)
- No SQL injection vectors identified
- Proper workspace isolation maintained
💡 Recommendations
- Add input sanitization for table names extracted from user code
- Document path format expectations to prevent malformed paths
- Consider rate limiting asset parsing operations if exposed to untrusted code
Testing Recommendations
The PR includes good unit tests, but consider adding:
- Integration tests for full parse → store → retrieve flow
- Edge case tests:
- Table names with special characters (
table-name,table.name) - Very long paths
- Unicode table names
- Nested queries with multiple access types
- Table names with special characters (
- Performance tests with large codebases
- Frontend E2E tests for the DB Manager specific table navigation
Best Practices Adherence
✅ Follows Windmill Guidelines
- Clean code with single responsibilities
- Avoids duplication (mostly - see inline comments)
- Rust performance optimizations applied correctly
- Proper error handling throughout
⚠️ Minor Deviations
- Some code duplication in frontend path parsing (see inline comments)
- Could extract more helper functions for better testability
- Some SQL query complexity that could be simplified
Reproduction Instructions for Testers
To test the specific asset tables feature, navigate to the script editor and create a new Python, TypeScript, or SQL script. Write code that accesses a database with specific tables - for example, in Python use db = wmill.datatable('mydata') followed by db.query('SELECT * FROM users'), or in SQL use ATTACH 'ducklake://mydl' AS dl; SELECT * FROM dl.customers;. After saving the script, go to the Assets page in the workspace navigation. You should see the specific tables listed (like mydata/users or mydl/customers) rather than just the database names. Each asset should show whether it's being read, written, or both. Click the "Manage" button next to any database asset to open the DB Manager directly to that specific table. In the Flow editor, when you reference a script with database assets, the flow's asset graph should display the specific tables being accessed with appropriate icons and access indicators.
Recommendation
Approve with minor revisions. The implementation is solid and well-tested. The issues identified are mostly minor improvements and edge case handling. The critical issue with SQL parser stack management (Query statement) should be verified, but likely works correctly in practice. Consider addressing the inline comments before merging, particularly:
- Fix the Query statement double-push issue if confirmed
- Add path validation in frontend
- Consider the access type merging logic improvement
- Add missing ducklake/datatable parsing in frontend
Overall, this is a valuable feature that significantly improves asset tracking granularity! 🎉
Claude finished @diegoimbert's task in 3m 21s —— View job
Code Review Complete ✅
This PR introduces specific table tracking for database assets, enabling Windmill to track not just which databases are accessed, but which specific tables within those databases. This is a valuable enhancement to the asset tracking system.
Summary of Changes
Backend Changes
- Asset Parsers Enhanced: SQL, Python, and TypeScript parsers now detect table-level access patterns
- Access Type Tracking: Distinguishes between Read (R), Write (W), and Read-Write (RW) operations
- Database Support: Handles
ATTACH,USE, and dot-notation syntax (e.g.,db.table_name) - Path Format: Assets now use hierarchical paths like
ducklake_name/table_nameoru/user/pg_db/table_name
Frontend Changes
- UI Updates: Components updated to display and manage specific tables
- DB Manager: Opens directly to specific tables when clicked from asset lists
- Asset Display: Improved icons and formatting for table-specific assets
- Path Parsing: Logic added to extract base database and specific table from asset paths
Code Quality Assessment
✅ Strengths
- Comprehensive Testing: All three parsers (SQL, Python, TypeScript) include extensive unit tests covering edge cases
- Consistent Design: Similar patterns used across all three language parsers
- Type Safety: Strong typing in both Rust backend and TypeScript frontend
- Backward Compatibility: Handles assets without specific tables (falls back to database-level tracking)
- Smart Access Type Merging: Logic to combine R/W access types when same table accessed multiple times
⚠️ Areas for Improvement
Building on the previous review, here are additional concerns and observations:
1. Access Type Merging Logic (Priority: HIGH)
The current logic in asset_parser.rs:46-52 treats (Some(R), None) → None, which loses information. This is problematic because:
- If an asset is first detected with
Raccess, then later with ambiguous access (None), we lose the knownR - This makes the tracking less useful than it could be
Recommendation: Change the merge logic to preserve known access types:
(None, None) => None,
(None, Some(x)) | (Some(x), None) => Some(x), // Preserve known access
(Some(R), Some(W)) | (Some(W), Some(R)) => Some(RW),
// ... rest unchanged
2. SQL Parser Stack Management (Priority: HIGH - Needs Verification)
In windmill-parser-sql/src/asset_parser.rs:
- Line 210:
Statement::Query(_)pushesRinpre_visit_statement - Line 344:
pre_visit_queryalso pushesR - This means nested queries have
Rpushed twice but only popped once inpost_visit_query(line 352) - The
post_visit_statement(line 333) pops the first one
This could cause incorrect access type detection in nested queries. For example:
INSERT INTO table1
SELECT * FROM (SELECT * FROM table2) -- nested query
The inner SELECT might incorrectly inherit W access from the INSERT context due to stack mismanagement.
Recommendation: Verify this behavior with nested query tests, or restructure to avoid double-pushing.
3. Path Validation & Security (Priority: MEDIUM)
Several areas lack input validation:
-
Backend: No validation that table names don't contain
/characters before constructing paths like{db}/{table}- Example: If user code references
table/with/slashes, the path becomes malformed - Occurs in:
sql/asset_parser.rs:74,sql/asset_parser.rs:87,py/asset_parser.rs:197,ts/asset_parser.rs:190
- Example: If user code references
-
Frontend: Path splitting assumes specific formats without validation
ExploreAssetButton.svelte:61-62: Assumes resource paths have ≥3 segments- If path is
u/user(only 2 segments),pathParts[3]is undefined - Similar issues in ducklake/datatable handlers (lines 72-78)
Recommendation:
- Backend: Sanitize table names before path construction (replace
/with_or reject) - Frontend: Add defensive checks or extract to a helper function
4. Frontend Code Duplication (Priority: LOW)
The pattern of splitting asset.path and extracting components is repeated in ExploreAssetButton.svelte:
- Lines 61-62 (resources)
- Lines 72-74 (ducklake)
- Lines 76-78 (datatable)
Recommendation: Extract to a helper function:
function parseAssetPath(path: string, minSegments: number = 1): {
basePath: string;
specificTable?: string;
} {
const parts = path.split('/');
return {
basePath: parts.slice(0, minSegments).join('/'),
specificTable: parts[minSegments]
};
}
5. Missing Asset Types in Frontend (Priority: MEDIUM)
lib.ts:parseAssetFromString() doesn't handle ducklake:// or datatable:// prefixes:
- Lines 53-62 only handle
res://,$res:, ands3:// - But backend supports
ducklake://anddatatable://(seeparse_asset_syntaxinasset_parser.rs:89-92)
Recommendation: Add cases for ducklake and datatable parsing.
6. TypeScript Parser Performance (Priority: LOW)
windmill-parser-ts/src/asset_parser.rs:82:
let saved_var_identifiers = self.var_identifiers.clone();
This clones the entire HashMap on every block scope entry. For deeply nested code with many blocks, this could be inefficient.
Recommendation: Consider using a stack-based approach or reference counting if this becomes a performance bottleneck.
7. API Query Performance (Priority: LOW)
windmill-api/src/assets.rs:47-48:
AND (asset.usage_kind <> 'flow' OR asset.usage_path = ANY(SELECT path FROM flow WHERE workspace_id = $1))
AND (asset.usage_kind <> 'script' OR asset.usage_path = ANY(SELECT path FROM script WHERE workspace_id = $1))
Uses subqueries in the WHERE clause. For large datasets, this could be slow.
Recommendation: Consider using JOINs instead:
LEFT JOIN flow ON asset.usage_kind = 'flow' AND asset.usage_path = flow.path AND flow.workspace_id = $1
LEFT JOIN script ON asset.usage_kind = 'script' AND asset.usage_path = script.path AND script.workspace_id = $1
WHERE asset.workspace_id = $1
AND (asset.kind <> 'resource' OR resource.path IS NOT NULL)
AND (asset.usage_kind <> 'flow' OR flow.path IS NOT NULL)
AND (asset.usage_kind <> 'script' OR script.path IS NOT NULL)
Also ensure indexes exist on flow.path and script.path.
Security Review
✅ No Critical Issues Found
- Asset paths are validated through
parse_asset_syntax() - Database queries use parameterized queries (sqlx)
- No SQL injection vectors identified
- Proper workspace isolation maintained
💡 Recommendations
- Add input sanitization for table names extracted from user code (especially for
/characters) - Document path format expectations to prevent malformed paths
- Consider rate limiting asset parsing operations if exposed to untrusted code
Testing Recommendations
The PR includes good unit tests, but consider adding:
- Integration tests for full parse → store → retrieve flow
- Edge case tests:
- Table names with special characters (
table-name,table.name,table/name) - Very long paths (stress test)
- Unicode table names
- Nested queries with multiple access types (to verify stack management)
- Queries that reference same table multiple times with different access patterns
- Table names with special characters (
- Performance tests with large codebases (100+ asset references)
- Frontend E2E tests for the DB Manager specific table navigation
Adherence to Best Practices
✅ Follows Windmill Guidelines
- Clean code with single responsibilities
- Avoids duplication (mostly - some frontend duplication noted)
- Rust performance optimizations applied correctly
- Proper error handling throughout
- No blocking operations in async contexts
⚠️ Minor Deviations
- Some code duplication in frontend path parsing
- Could extract more helper functions for better testability
- SQL query complexity could be simplified
Additional Observations
- Good defensive programming: The parsers handle missing cases gracefully and fall back to unknown access types rather than failing
- Excellent test coverage: Each parser has comprehensive tests demonstrating various scenarios
- Consistent API: The asset format (
kind/path/table) is consistent across all three languages - Developer experience: The feature integrates well with existing UI components (DB Manager opening to specific tables is a nice touch)
Reproduction Instructions for Testers
To test the specific asset tables feature, navigate to the script editor and create a new Python, TypeScript, or SQL script. Write code that accesses a database with specific tables - for example, in Python use db = wmill.datatable('mydata') followed by db.query('SELECT * FROM users'), or in SQL use ATTACH 'ducklake://mydl' AS dl; SELECT * FROM dl.customers;, or in TypeScript use let sql = wmill.datatable('dt') followed by await sql\SELECT * FROM friends`.fetch(). After saving the script, go to the Assets page in the workspace navigation. You should see the specific tables listed (like mydata/usersormydl/customers`) rather than just the database names. Each asset should show whether it's being read, written, or both using R, W, or RW indicators. Click the "Manage" button next to any database asset to open the DB Manager directly to that specific table - the table should be pre-selected in the table list on the left sidebar. In the Flow editor, when you reference a script with database assets, the flow's asset graph should display the specific tables being accessed with appropriate icons and access indicators, distinguishing between read operations (shown with read indicators) and write operations (shown with write indicators).
Recommendation
Approve with minor revisions. The implementation is solid and well-tested. The issues identified are mostly minor improvements and edge case handling. The critical issues should be addressed:
Must Fix Before Merge:
- ✅ Verify SQL parser stack management for nested queries (or add test confirming it works)
- ⚠️ Consider the access type merging logic - current behavior might be intentional but loses information
Should Fix (Can be follow-up):
- Add path validation/sanitization for table names with
/ - Add frontend path parsing validation
- Add missing ducklake/datatable parsing in frontend
parseAssetFromString - Extract frontend path parsing helper
Nice to Have:
- Optimize TypeScript parser HashMap cloning
- Optimize API query with JOINs instead of subqueries
Overall, this is a valuable feature that significantly improves asset tracking granularity! The code quality is high, testing is comprehensive, and the implementation follows established patterns. The concerns raised are mostly about edge cases and potential optimizations rather than fundamental issues. 🎉