csv-diff icon indicating copy to clipboard operation
csv-diff copied to clipboard

Long processing times when handling very large files

Open hammzj opened this issue 4 years ago • 3 comments

Hello,

I've been here before and I'm back 😊 This gem has become a cornerstone in one of the projects I've developed. In most cases, it handles very well, minus some configuration options we need to customize to each scenario, but so it goes.

Now, we are working with larger files, 1.5million and such rows. In some cases, it seems to take hours. I've tested this before in tests with files between 500,000 and 1,000,000 rows, and have experienced around 15 minutes or more to fully process diffs of these files using the gem. We can deal with that even though it's not lovely, but any time taking longer than that is detrimental.

Now, I am not sure if this is an issue with how we provide key_fields or such, but I am mainly writing this issue out as a question on what experiences people have had with comparing large files? Is this a gem constraint, our own CSVDiff configuration, or something else?

What have you recorded for working with files of one-million plus rows, with up to 100 columns?

hammzj avatar May 05 '20 18:05 hammzj