data-diff icon indicating copy to clipboard operation
data-diff copied to clipboard

Add support for Microsoft SQL Server 2016+

Open DVAlexHiggs opened this issue 1 year ago • 6 comments

Hi All,

Very interested in this project, looks fantastic to me!. I see a significant and probably quite common use case here where developers and the business will want to be reassured of a successful migration from an old legacy system to a cloud platform such as Snowflake.

As a consultant, I see a huge number of clients on MS SQL Server, and this is the source of the data for their migration from on-prem to the cloud.

I'm excited to use this tool but unfortunately it does not support most of the use cases I would want to use it for, due to the lack of MS SQL Support. To this end, I would like to suggest support for this platform.

I'm also more than happy to contribute to this, and I'm wondering if anyone can point me in the right direction for contributing please? I see the developer environment guide in the README, but what I mean is more specific information about creating a new adapter/adding new database support.

Thanks!

DVAlexHiggs avatar Jul 17 '22 17:07 DVAlexHiggs

Hello. Thanks for reaching us!

MS SQL is one of the databases on the near-future roadmap. It has a few performance-related challenges though.

To make a new driver, take a look at the existing ones here: https://github.com/datafold/data-diff/tree/master/data_diff/databases — and do the same. Then also run the tests, as documented (CONTRIBUTING.md)[https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md].

If you have any questions, feel free to ask.

nolar avatar Jul 18 '22 07:07 nolar

Hello. Thanks for reaching us!

MS SQL is one of the databases on the near-future roadmap. It has a few performance-related challenges though.

To make a new driver, take a look at the existing ones here: https://github.com/datafold/data-diff/tree/master/data_diff/databases — and do the same. Then also run the tests, as documented (CONTRIBUTING.md)[https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md].

If you have any questions, feel free to ask.

Thanks for your response!

I see there is a commented out mssql module. Anything I should know about this?

DVAlexHiggs avatar Jul 18 '22 07:07 DVAlexHiggs

@DVAlexHiggs You can see more details on that in #51.

nolar avatar Jul 18 '22 07:07 nolar

Not sure how this got closed. Think I mis-clicked

DVAlexHiggs avatar Jul 18 '22 08:07 DVAlexHiggs

@DVAlexHiggs Any update on this? I also could help contribute, but don't want to pick it up if you are already actively working on it.

cacondie avatar Sep 09 '22 20:09 cacondie

@cacondie ,

We are currently not pursuing this avenue, but we will be happy to accept contributions. However, be aware that the solution isn't likely to be simple.

SqlServer does support MD5 (which is what we used for hashing), but it is approx. 100 times slower than postgresql, which makes it unusable for practical purposes.

I think the solution would have to involve implementing our own checksum, probably using only simple arithmetic operations, since SQL isn't capable of much more.

Keep in mind that whatever checksum is used for SQLServer, it has to be supported by all the other databases, so that comparisons are possible. That probably means implementing this new checksum for each one. (or at least the major ones)

If you can think of a new creative solution, we'll be happy to consider it.

erezsh avatar Sep 10 '22 07:09 erezsh

Would using a CLR stored procedure to compute MD5 be an option?

icosahedron avatar Dec 08 '22 02:12 icosahedron

Probably not as many SQL Server implementations, including just about anything on cloud storage, disallow CLR procs.

masonwheeler avatar Dec 08 '22 02:12 masonwheeler

I proposed a solution to this issue in #51

erezsh avatar Dec 13 '22 22:12 erezsh

This issue has been marked as stale because it has been open for 60 days with no activity. If you would like the issue to remain open, please comment on the issue and it will be added to the triage queue. Otherwise, it will be closed in 7 days.

github-actions[bot] avatar May 26 '23 06:05 github-actions[bot]

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment and it will be reopened for triage.

github-actions[bot] avatar Jun 02 '23 06:06 github-actions[bot]