snakebids icon indicating copy to clipboard operation
snakebids copied to clipboard

Method to find entity combinations "missing" from dataset

Open pvandyken opened this issue 2 years ago • 1 comments

Most of the new filtering apis (#209) focus on removing any entity combinations that are missing in one or more components. For example, component B is missing subject 2, so subject 2 is completely excluded from .expand().

It would be helpful for QC to have a method to print all of these missing entities. This would let developers and users quickly query missing parts of their datasets. Technically, this is equal to:

# pseudocode
product(*dataset.entities.values()) - dataset.zip_lists

In other words, the maximal zip list subtracted by the actual zip list.

My idea is to have a base method that returns a zip_list like representation of all missing groupings, and possibly another convenience method to print the list in a nice table. I need to think about the exact API yet, but if anyone has ideas please share!

pvandyken avatar Feb 16 '23 17:02 pvandyken

Signature proposal: Dataset.missing_entities() -> dict[str, dict[str, list[str]]

  • Components with no missing entities should be present in the dictionary, with every entity in its dictionary having the value of any empty list.
    • e.g. {'component_a': {'sub': [], 'ses': []}}

tkkuehn avatar Feb 17 '23 17:02 tkkuehn