tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Python verify_tables method?

Open hyanwong opened this issue 3 years ago • 6 comments
trafficstars

A good number of times I have made tables that aren't valid tree sequences. When trying to convert them, I get errors such as

_tskit.LibraryError: time[parent] must be greater than time[child]

It would be really helpful to know which nodes / edges were causing the problem (it's easy in this case, as I can run through the edges, but for some other tree sequence properties it's less obvious). I assume we have python code in the test suite that can help to debug this. I wonder if a separate (slow) routine, tables.validate() or similar, which could output more specific error messages, would be helpful here?

hyanwong avatar Mar 11 '22 20:03 hyanwong

I agree, it would be very helpful but is a bit of a slog to actually do.

Also, do we have an issue for this already?

jeromekelleher avatar Mar 13 '22 17:03 jeromekelleher

Also, do we have an issue for this already?

Hmm, I didn't find one, but didn't look hard either.

hyanwong avatar Mar 13 '22 17:03 hyanwong

No, I can't find one either. Must be something we discussed in one of the long threads.

jeromekelleher avatar Mar 14 '22 09:03 jeromekelleher

What we probably want is something like assert_tables_equal where we first call the C function to make the quick check, and then if that fails we slog through the various ways it could fail.

So, Tables.assert_valid() or something?

jeromekelleher avatar Mar 14 '22 09:03 jeromekelleher

Great idea, would be quite a nice thing for a contributor to do as it is pure python and the C routines can be used a reference. I don't see me getting to this anytime soon.

benjeffery avatar Mar 14 '22 09:03 benjeffery

Agreed - marked as "help wanted"

jeromekelleher avatar Mar 14 '22 09:03 jeromekelleher