condenser icon indicating copy to clipboard operation
condenser copied to clipboard

Measuring Loss of Data when cutting an edge to remove a cycle

Open raresboza opened this issue 3 years ago • 1 comments

Greetings, I was reading the following article on subsetting: https://www.tonic.ai/blog/condenser-a-database-subsetting-tool

I don't exactly understand what the faults are at dropping a cycle from a database. Of course, one loses data when doing so, but is the same amount of data lost irrespective of where you cut the cycle? How could one measure that? What are some of the criteria that affect it?

raresboza avatar Aug 19 '21 14:08 raresboza

Hi @raresboza, I'm not sure I understand your question — condenser is setup to handle dependency breaks wherever it best makes sense, but realistically all it does is shoves NULLs in the column in question. Otherwise a cycle would ultimately cause all of the data to be grabbed within tables in the cycle in some cases, and we'd certainly not be able to peform a true topological sort.

Ultimately it comes down to whatever column you find less valuable in order to determine where to make the break.

theaeolianmachine avatar Nov 04 '21 18:11 theaeolianmachine