mosaic
mosaic copied to clipboard
Query that filters across two tables
Hi! I am building a custom MosaicClient
for my own visualization components. I have two Selection
instances using crossfilter - one for table1 and another for table2. Each Selection
can have few histograms with own conditions. I'm wondering if it's possible to create a filter that queries across both Selections
on the different tables?
The case is that table2 has ids and corresponding values, while table1 has links between the ids in table2 and values for each link - so table1.source and table1.target columns are the same type as table2.id.
I need to do something like this query below but Query
has no join
method and I have no idea how to tell Query
how it should compare table2.id with table1.source and table1.source using existing methods.
SELECT
table2.value,
table2.id,
FROM table2 JOIN table1 ON table2.id = table1.source OR table2.id = table1.target
WHERE
table2.value >= 46 AND table2.value <= 50
...another crossfiltered conditions from table2 Selection
AND
table1.value >= 0 AND table1.value <= 10
...another crossfiltered conditions from table1 Selection
I've also tried to construct my own query class that extends Query
with a .join()
method. However, I could not construct this query completely using only Query
methods, there are still parts that require string construction:
My extended Query
class is:
export class SuperQuery extends Query {
constructor () {
super()
// Initialize the join array in the new class
this.query.join = []
}
join (table, condition) {
this.query.join.push({ table, condition })
return this
}
toString () {
// First, build the SELECT and FROM clauses using the parent toString method
const selectFromSql = super.toString().split(' WHERE ')[0]
// Then, build the JOIN clauses if any
let joinSql = ''
if (this.query.join.length) {
joinSql = this.query.join.map(({ table, condition }) => `JOIN ${table} ON ${condition}`).join(' ')
}
// Next, build the WHERE clause if any
const whereClause = super.toString().split(' WHERE ')[1]
let whereSql = ''
if (whereClause) {
whereSql = `WHERE ${whereClause}`
}
// Combine the SELECT, FROM, JOIN, and WHERE clauses in the correct order
return `${selectFromSql} ${joinSql} ${whereSql}`.trim()
}
}
And its usage in MosaicClient
:
// filter here is a Selection instance of table2
query (filter: string[] = []): Query {
const table2Filter = filter.join(' AND ')
const table1Filter = table1Selection.clauses.map(c => c.predicate).join(' AND ')
return SuperQuery
.select(fields)
.from(table2)
.join(table1, `table2.id = table1.source OR table2.id = table1.target`)
.where(`${table1Filter} ${filter.length ? `${!table1Selection.clauses.length ? '' : ' AND'} ${table2Filter}` : ''}`)
}
It generates a correct query and works on small tables (~30,000 rows), but on tables with 100,000+ rows, it does not work at all, no reaction on selection and no errors in the console. This is kind of strange, I suspect there may be an indexing issue caused by my string constructions because they can't be optimized by Mosaic. And I don't use .or()
and .and()
of Query
here because they add quotes like "table2.id" for exact column name but this produces a broken query as there is no exact "table2.id" column in the context of comparing between few tables.
How can I solve this problem?
Mosaic does not currently have dedicated support for filtering across multiple tables. While optimizations such as query caching and query consolidation (grouping queries over the same table with the same groupby values) apply generally, our data cube indexes apply only to selections with clients that all target the same base table.
When feasible, I would recommend performing any joins as a pre-processing step: create a new table (or view) that joins the data together, and then use that as your base for visualization and interaction. The table or view could be created as part of your initial data loading; we do this in the unemployment map example.
Otherwise, it would be great to learn more about your use case needs. We'd like to provide richer support for cross-table linking, but we're still in a "requirements gathering" phase of design. (And I believe @domoritz and collaborators may also be looking into similar issues?)
As for this part: "It generates a correct query and works on small tables (~30,000 rows), but on tables with 100,000+ rows, it does not work at all, no reaction on selection and no errors in the console."
, I don't know what is going on. Indeed it seems strange that the scale difference alone would fundamentally change things beyond performance (unless the joined results are so large they overrun browser memory limits). I'd start by looking at DuckDB logs to see if the queries are executing successfully, and of course you can also log your custom client's queryResult
and queryError
methods to see if a response is arriving.
CC @willeppy
@jheer Thank you for the answer!
I'm working on a graph visualization. It has one raw data table and two additional tables: one for unique nodes and one for links between them. The graph relies on these generated nodes and links tables. The nodes and links have aggregated values calculated from the raw data, visualized in histograms for each value. If the aggregated value is by link, the histogram will select both nodes involved in the link within the selected value range. So selecting nodes in the graph by histogram ranges, crossfiltered by nodes and links, is a clear use case in the graph visualization context.
Crossfiltering between only nodes or only link values works super fast, I'll try to manage crossfiltering between nodes and links in the code, not SQL.
P.S. I've debugged my join queries on large tables and found that they execute successfully, but run very slowly - up to 2-10 minutes, even though the result set may only contain a few rows.