data-diff
data-diff copied to clipboard
Compare tables within or across databases
Hi All, Very interested in this project, looks fantastic to me!. I see a significant and probably quite common use case here where developers and the business will want to...
Hello guys! First of all, let me thank you for this incredible package you've achieved. I find myself with this recurrent problem of trying to compare two "mirror tables" or...
Currently if you run the benchmarking scripts (see README and https://github.com/datafold/data-diff/pull/135) it's _very_ slow against the cloud databases. It would be better to use CSV imports for the cloud databases...
Currently the benchmarking introduced in https://github.com/datafold/data-diff/pull/135 checks two tables that are equal. We'd love to add some tests where we delete/change an increasing % of rows, starting at just 1...
Snowflake: Allow auth by PKCS8 key for Snowflake (password auth disabled) User private key specification Mapping bigint to Integer Presto: Allow verify http session with certificate for Presto Allow using...
Right now we only support `md5` for hashing columns. However, for some databases that might not be feasible. For example, in `mssql` it's too slow, and sounds like Spanner only...
You might have a column called `money` in one database, and `amount` in another. Today we don't have a way to have the columns have different names across the two...
For row-based databases if indexes are missing on the columns queried, the queries will be substantially slower on large tables (unless you're querying all columns). Today, you can run with...
Today, one of the caveats of `data-diff` is that it's going to be significantly slower if you have _a lot_ of differences, because we'll be checksumming so many segments repeatedly...
We might want to add to the database drivers the ability to use them as output for the diff, so we can create `diff_{src_table_name}_{dst_table_name}` with all the differing rows. This...