cgpm icon indicating copy to clipboard operation
cgpm copied to clipboard

patch to loomcat

Open Schaechtle opened this issue 7 years ago • 1 comments

This patch is needed since the column names I am getting from loom while calling _update_state (and _retrieve_featureid_to_cgpm, specifically) don't follow the assumptions made in the code. In fact, the column names I am getting from loom are the actual natural language column names which seems correct (thus patching cgpm).

I have a test case for this patch which depends on loom and bql -- I initially planned to include the test case here (i.e the cgpm repo). Looking at the test suite however, I am not sure anymore if this makes sense (not much loom and BQL around in cgpm/test); I am now planning to include this test in another repo focussed only on exporting models from loom. Open for suggestions here.

Tested using the probcomp dev image as folllows:

 2018-08-13 14:47:24 ⌚  ulli-notebook in ~/cgpm
○ → pytest tests/

Result:

454 passed, 8 skipped, 2 xfailed in 363.28 seconds

Schaechtle avatar Aug 13 '18 14:08 Schaechtle

The loomcat conversion is assuming that the loom project on disk was created using loomcat.initialize(state), in which case the loom column names are of the form c%05d. This encoding scheme is needed since cgpm does not have a notion of natural language column names, only column numbers.

When the loom project on disk was created using BayesDB's loom_backend class instead of cgpm's loomcat.initialize method, the loom column names are the natural language column names since these are known to BayesDB.

The assumptions in the code should be further loosened by:

  • [ ] Making colname_colno_mapping a required argument of loomcat._retrieve_featureid_to_cgpm and loomcat._update_state.
  • [ ] Updating loomcat.transition and loomcat.transition_engine to use the output mapping given by _generate_column_names and pass this mapping into _update_state.

fsaad avatar Aug 13 '18 15:08 fsaad