PClean icon indicating copy to clipboard operation
PClean copied to clipboard

Support using (wrapped?) Gen distributions in PClean models?

Open marcoct opened this issue 4 years ago • 1 comments

Also related to https://github.com/probcomp/GenDistributions.jl and https://github.com/probcomp/Gen.jl/issues/362

marcoct avatar Mar 05 '21 23:03 marcoct

It does seem useful to discuss whether it's really a different set of modeling primitives that are intended to be used in PClean versus Gen. In some cases, there could be the same primitives, but with different -- and less jargon-y -- names. But I could also imagine that most PClean users won't need to model the low-level numerical data types that most Gen distributions are based on.

This came up because I saw a date field in my data set. My initial reaction was "PClean probably needs a Date type" with a D/M/Y integers. But then I thought, well -- aren't dates basically integers from some day 0? So I just need a distribution on integers, and Gen has that. But I think for dates, and most other data appearing in PClean data sets, there are many representations that could be optimized for expressing different types of knowledge.

A key question is whether to (i) encourage lower-level logic like constructing dates to take place in user code for now (e.g. if I wanted to model dates, I could use Ints and then write the manual String conversion code in the model, I think) -- an approach that would make it natural to overlap with Gen's distribution, or (ii) stay with the current pattern of adding distributions for higher-level data types that are have more specialized semantics, like 'Date'.

It also seems like a user might be able to get pretty far with the string distributions provided. It's not obvious to me when more distributions and other primitives need to be added, or what the process for adding them could be.

marcoct avatar Mar 05 '21 23:03 marcoct