PClean icon indicating copy to clipboard operation
PClean copied to clipboard

A domain-specific probabilistic programming language for scalable Bayesian data cleaning

Results 21 PClean issues
Sort by recently updated
recently updated
newest added

Read CSV string columns as String rather than fixed size string

Added Japanese support for the StringPrior distribution DigitalGarageLab Collaborating with MIT ProbComp

A good prior distribution on person names (first names, last name, etc.) -- but many other types of names including place names -- seems important for cases when it is...

The goal is to allow parameters (possibly from different classes) to be transformed before they are used as arguments to distributions. For example, linear combinations of normally-distributed parameters can still...

Performance of Flights model suffers without the subproblem block at https://github.com/probcomp/PClean/commit/f51c9489dda76a6dbfd7c64fc166a5c94b13db7a#diff-2a3b7234fcda10bae8f2e3e677e2add7dc29ea841a266f8a13708c4e57ac069bR14 but it is unclear to me why this should be the case: the flight ID is always observed.

bug

This includes * Runtime + accuracy-over-time measurements against baseline inference algorithms (Figure 6) * Configuration for baseline systems (HoloClean + NADEEF) * Uncertainty-aware analysis of Rents dataset

This text implies that `ProposalDummyValue`s are only used for distributions that have infinite (and discrete) support: https://github.com/probcomp/PClean/blob/master/src/distributions/distributions.jl#L10-L14 But `StringPrior` has finite support (there is a maximum length), and it implements...

documentation

Also related to https://github.com/probcomp/GenDistributions.jl and https://github.com/probcomp/Gen.jl/issues/362

Here are a bunch of not particularly organized notes I had lying around about this... ## Existing approaches to Split Merge in the literature: A split-merge algorithm is made up...

research