windmill Specific asset tables

Detect which specific tables were accessed from a ducklake/datatable/db resource when parsing assets.

Dec 09 '25 16:12 diegoimbert

Deploying windmill with Cloudflare Pages

Latest commit:	`c694d49`
Status:	✅ Deploy successful!
Preview URL:	https://e6677f37.windmill.pages.dev
Branch Preview URL:	https://di-specific-asset-tables-2.windmill.pages.dev

View logs

Dec 09 '25 16:12 cloudflare-workers-and-pages[bot]

Claude finished @diegoimbert's task in 3m 43s —— View job

Code Review Complete ✅

This PR introduces specific table tracking for database assets (Ducklake, DataTable, and database resources), enabling Windmill to track not just which databases are accessed, but which specific tables within those databases. This is a significant enhancement to the asset tracking system.

Summary of Changes

Backend Changes

Asset Parsers Enhanced: SQL, Python, and TypeScript parsers now detect table-level access patterns
Access Type Tracking: Distinguishes between Read (R), Write (W), and Read-Write (RW) operations
Database Support: Handles ATTACH, USE, and dot-notation syntax (e.g., db.table_name)
Path Format: Assets now use hierarchical paths like ducklake_name/table_name or u/user/pg_db/table_name

Frontend Changes

UI Updates: Components updated to display and manage specific tables
DB Manager: Opens directly to specific tables when clicked from asset lists
Asset Display: Improved icons and formatting for table-specific assets
Path Parsing: Logic added to extract base database and specific table from asset paths

Code Quality Assessment

✅ Strengths

Comprehensive Testing: All three parsers (SQL, Python, TypeScript) include extensive unit tests covering edge cases
Consistent Design: Similar patterns used across all three language parsers
Type Safety: Strong typing in both Rust backend and TypeScript frontend
Backward Compatibility: Handles assets without specific tables (falls back to database-level tracking)
Access Type Merging: Smart logic to combine R/W access types when same table accessed multiple times

⚠️ Areas for Improvement

I've left inline comments on specific issues. Key concerns:

1. Path Validation & Security

No validation that table names don't contain / characters, which could create malformed paths
String interpolation in SQL construction (Python/TS parsers) should be reviewed for edge cases
Consider sanitizing user input from parsed code before using in paths

2. Access Type Merging Logic (Priority: Medium)

The current logic in asset_parser.rs:46-52 treats (Some(R), None) → None. This loses information. Consider:

(None, Some(x)) | (Some(x), None) => Some(x),  // Preserve known access

3. SQL Parser Stack Management (Priority: High)

The Query statement pushes R twice (lines 210 and 344), which could cause incorrect access type detection in nested queries. This needs verification.

4. Frontend Path Parsing

Multiple locations split asset.path by / without validating the expected format:

ExploreAssetButton.svelte:61-62 - assumes 3+ segments for resources
Similar patterns in ducklake/datatable handlers
Should add defensive validation or use a helper function

5. Missing Asset Types in Frontend

lib.ts:parseAssetFromString() doesn't handle ducklake:// or datatable:// prefixes, though these exist in the backend parser.

6. Performance Considerations

TypeScript parser clones entire var_identifiers HashMap on every block scope entry
API query uses subqueries that could be optimized with JOINs for large datasets
Consider adding database indexes if not present

Security Review

✅ No Critical Issues Found

Asset paths are validated through parse_asset_syntax()
Database queries use parameterized queries (sqlx)
No SQL injection vectors identified
Proper workspace isolation maintained

💡 Recommendations

Add input sanitization for table names extracted from user code
Document path format expectations to prevent malformed paths
Consider rate limiting asset parsing operations if exposed to untrusted code

Testing Recommendations

The PR includes good unit tests, but consider adding:

Integration tests for full parse → store → retrieve flow
Edge case tests:
- Table names with special characters (table-name, table.name)
- Very long paths
- Unicode table names
- Nested queries with multiple access types
Performance tests with large codebases
Frontend E2E tests for the DB Manager specific table navigation

Best Practices Adherence

✅ Follows Windmill Guidelines

Clean code with single responsibilities
Avoids duplication (mostly - see inline comments)
Rust performance optimizations applied correctly
Proper error handling throughout

⚠️ Minor Deviations

Some code duplication in frontend path parsing (see inline comments)
Could extract more helper functions for better testability
Some SQL query complexity that could be simplified

Reproduction Instructions for Testers

To test the specific asset tables feature, navigate to the script editor and create a new Python, TypeScript, or SQL script. Write code that accesses a database with specific tables - for example, in Python use db = wmill.datatable('mydata') followed by db.query('SELECT * FROM users'), or in SQL use ATTACH 'ducklake://mydl' AS dl; SELECT * FROM dl.customers;. After saving the script, go to the Assets page in the workspace navigation. You should see the specific tables listed (like mydata/users or mydl/customers) rather than just the database names. Each asset should show whether it's being read, written, or both. Click the "Manage" button next to any database asset to open the DB Manager directly to that specific table. In the Flow editor, when you reference a script with database assets, the flow's asset graph should display the specific tables being accessed with appropriate icons and access indicators.

Recommendation

Approve with minor revisions. The implementation is solid and well-tested. The issues identified are mostly minor improvements and edge case handling. The critical issue with SQL parser stack management (Query statement) should be verified, but likely works correctly in practice. Consider addressing the inline comments before merging, particularly:

Fix the Query statement double-push issue if confirmed
Add path validation in frontend
Consider the access type merging logic improvement
Add missing ducklake/datatable parsing in frontend

Overall, this is a valuable feature that significantly improves asset tracking granularity! 🎉

Dec 10 '25 13:12 claude[bot]

Claude finished @diegoimbert's task in 3m 21s —— View job

Code Review Complete ✅

This PR introduces specific table tracking for database assets, enabling Windmill to track not just which databases are accessed, but which specific tables within those databases. This is a valuable enhancement to the asset tracking system.

Summary of Changes

Backend Changes

Asset Parsers Enhanced: SQL, Python, and TypeScript parsers now detect table-level access patterns
Access Type Tracking: Distinguishes between Read (R), Write (W), and Read-Write (RW) operations
Database Support: Handles ATTACH, USE, and dot-notation syntax (e.g., db.table_name)
Path Format: Assets now use hierarchical paths like ducklake_name/table_name or u/user/pg_db/table_name

Frontend Changes

UI Updates: Components updated to display and manage specific tables
DB Manager: Opens directly to specific tables when clicked from asset lists
Asset Display: Improved icons and formatting for table-specific assets
Path Parsing: Logic added to extract base database and specific table from asset paths

Code Quality Assessment

✅ Strengths

Comprehensive Testing: All three parsers (SQL, Python, TypeScript) include extensive unit tests covering edge cases
Consistent Design: Similar patterns used across all three language parsers
Type Safety: Strong typing in both Rust backend and TypeScript frontend
Backward Compatibility: Handles assets without specific tables (falls back to database-level tracking)
Smart Access Type Merging: Logic to combine R/W access types when same table accessed multiple times

⚠️ Areas for Improvement

Building on the previous review, here are additional concerns and observations:

1. Access Type Merging Logic (Priority: HIGH)

The current logic in asset_parser.rs:46-52 treats (Some(R), None) → None, which loses information. This is problematic because:

If an asset is first detected with R access, then later with ambiguous access (None), we lose the known R
This makes the tracking less useful than it could be

Recommendation: Change the merge logic to preserve known access types:

(None, None) => None,
(None, Some(x)) | (Some(x), None) => Some(x),  // Preserve known access
(Some(R), Some(W)) | (Some(W), Some(R)) => Some(RW),
// ... rest unchanged

2. SQL Parser Stack Management (Priority: HIGH - Needs Verification)

In windmill-parser-sql/src/asset_parser.rs:

Line 210: Statement::Query(_) pushes R in pre_visit_statement
Line 344: pre_visit_query also pushes R
This means nested queries have R pushed twice but only popped once in post_visit_query (line 352)
The post_visit_statement (line 333) pops the first one

This could cause incorrect access type detection in nested queries. For example:

INSERT INTO table1 
SELECT * FROM (SELECT * FROM table2)  -- nested query

The inner SELECT might incorrectly inherit W access from the INSERT context due to stack mismanagement.

Recommendation: Verify this behavior with nested query tests, or restructure to avoid double-pushing.

3. Path Validation & Security (Priority: MEDIUM)

Several areas lack input validation:

Backend: No validation that table names don't contain / characters before constructing paths like {db}/{table}
- Example: If user code references table/with/slashes, the path becomes malformed
- Occurs in: sql/asset_parser.rs:74, sql/asset_parser.rs:87, py/asset_parser.rs:197, ts/asset_parser.rs:190
Frontend: Path splitting assumes specific formats without validation
- ExploreAssetButton.svelte:61-62: Assumes resource paths have ≥3 segments
- If path is u/user (only 2 segments), pathParts[3] is undefined
- Similar issues in ducklake/datatable handlers (lines 72-78)

Recommendation:

Backend: Sanitize table names before path construction (replace / with _ or reject)
Frontend: Add defensive checks or extract to a helper function

4. Frontend Code Duplication (Priority: LOW)

The pattern of splitting asset.path and extracting components is repeated in ExploreAssetButton.svelte:

Lines 61-62 (resources)
Lines 72-74 (ducklake)
Lines 76-78 (datatable)

Recommendation: Extract to a helper function:

function parseAssetPath(path: string, minSegments: number = 1): {
    basePath: string;
    specificTable?: string;
} {
    const parts = path.split('/');
    return {
        basePath: parts.slice(0, minSegments).join('/'),
        specificTable: parts[minSegments]
    };
}

5. Missing Asset Types in Frontend (Priority: MEDIUM)

lib.ts:parseAssetFromString() doesn't handle ducklake:// or datatable:// prefixes:

Lines 53-62 only handle res://, $res:, and s3://
But backend supports ducklake:// and datatable:// (see parse_asset_syntax in asset_parser.rs:89-92)

Recommendation: Add cases for ducklake and datatable parsing.

6. TypeScript Parser Performance (Priority: LOW)

windmill-parser-ts/src/asset_parser.rs:82:

let saved_var_identifiers = self.var_identifiers.clone();

This clones the entire HashMap on every block scope entry. For deeply nested code with many blocks, this could be inefficient.

Recommendation: Consider using a stack-based approach or reference counting if this becomes a performance bottleneck.

7. API Query Performance (Priority: LOW)

windmill-api/src/assets.rs:47-48:

AND (asset.usage_kind <> 'flow' OR asset.usage_path = ANY(SELECT path FROM flow WHERE workspace_id = $1))
AND (asset.usage_kind <> 'script' OR asset.usage_path = ANY(SELECT path FROM script WHERE workspace_id = $1))

Uses subqueries in the WHERE clause. For large datasets, this could be slow.

Recommendation: Consider using JOINs instead:

LEFT JOIN flow ON asset.usage_kind = 'flow' AND asset.usage_path = flow.path AND flow.workspace_id = $1
LEFT JOIN script ON asset.usage_kind = 'script' AND asset.usage_path = script.path AND script.workspace_id = $1
WHERE asset.workspace_id = $1
  AND (asset.kind <> 'resource' OR resource.path IS NOT NULL)
  AND (asset.usage_kind <> 'flow' OR flow.path IS NOT NULL)
  AND (asset.usage_kind <> 'script' OR script.path IS NOT NULL)

Also ensure indexes exist on flow.path and script.path.

Security Review

✅ No Critical Issues Found

Asset paths are validated through parse_asset_syntax()
Database queries use parameterized queries (sqlx)
No SQL injection vectors identified
Proper workspace isolation maintained

💡 Recommendations

Add input sanitization for table names extracted from user code (especially for / characters)
Document path format expectations to prevent malformed paths
Consider rate limiting asset parsing operations if exposed to untrusted code

Testing Recommendations

The PR includes good unit tests, but consider adding:

Integration tests for full parse → store → retrieve flow
Edge case tests:
- Table names with special characters (table-name, table.name, table/name)
- Very long paths (stress test)
- Unicode table names
- Nested queries with multiple access types (to verify stack management)
- Queries that reference same table multiple times with different access patterns
Performance tests with large codebases (100+ asset references)
Frontend E2E tests for the DB Manager specific table navigation

Adherence to Best Practices

✅ Follows Windmill Guidelines

Clean code with single responsibilities
Avoids duplication (mostly - some frontend duplication noted)
Rust performance optimizations applied correctly
Proper error handling throughout
No blocking operations in async contexts

⚠️ Minor Deviations

Some code duplication in frontend path parsing
Could extract more helper functions for better testability
SQL query complexity could be simplified

Additional Observations

Good defensive programming: The parsers handle missing cases gracefully and fall back to unknown access types rather than failing
Excellent test coverage: Each parser has comprehensive tests demonstrating various scenarios
Consistent API: The asset format (kind/path/table) is consistent across all three languages
Developer experience: The feature integrates well with existing UI components (DB Manager opening to specific tables is a nice touch)

Reproduction Instructions for Testers

To test the specific asset tables feature, navigate to the script editor and create a new Python, TypeScript, or SQL script. Write code that accesses a database with specific tables - for example, in Python use db = wmill.datatable('mydata') followed by db.query('SELECT * FROM users'), or in SQL use ATTACH 'ducklake://mydl' AS dl; SELECT * FROM dl.customers;, or in TypeScript use let sql = wmill.datatable('dt') followed by await sql\SELECT * FROM friends`.fetch(). After saving the script, go to the Assets page in the workspace navigation. You should see the specific tables listed (like mydata/usersormydl/customers`) rather than just the database names. Each asset should show whether it's being read, written, or both using R, W, or RW indicators. Click the "Manage" button next to any database asset to open the DB Manager directly to that specific table - the table should be pre-selected in the table list on the left sidebar. In the Flow editor, when you reference a script with database assets, the flow's asset graph should display the specific tables being accessed with appropriate icons and access indicators, distinguishing between read operations (shown with read indicators) and write operations (shown with write indicators).

Recommendation

Approve with minor revisions. The implementation is solid and well-tested. The issues identified are mostly minor improvements and edge case handling. The critical issues should be addressed:

Must Fix Before Merge:

✅ Verify SQL parser stack management for nested queries (or add test confirming it works)
⚠️ Consider the access type merging logic - current behavior might be intentional but loses information

Should Fix (Can be follow-up):

Add path validation/sanitization for table names with /
Add frontend path parsing validation
Add missing ducklake/datatable parsing in frontend parseAssetFromString
Extract frontend path parsing helper

Nice to Have:

Optimize TypeScript parser HashMap cloning
Optimize API query with JOINs instead of subqueries

Overall, this is a valuable feature that significantly improves asset tracking granularity! The code quality is high, testing is comprehensive, and the implementation follows established patterns. The concerns raised are mostly about edge cases and potential optimizations rather than fundamental issues. 🎉

Dec 10 '25 15:12 claude[bot]