private-pgm icon indicating copy to clipboard operation
private-pgm copied to clipboard

FactoredInference is missing in main

Open Bain-OS opened this issue 10 months ago • 4 comments

Hey, cool project :)

I was trying to follow along the tutorial here

But the current main branch is missing the FactoredInference class.

Looks like the correct commit to install from is https://github.com/ryan112358/private-pgm/tree/99f287e6fac2d9f7027f28f9c378a9c024b9e653

Bain-OS avatar Feb 12 '25 01:02 Bain-OS

Here is a partial refactor to make it work with main

FYI this is percent format notebook, you can use jupytext to change it to ipynb with jupytext --to ipynb synthetic_data_example.py

It seems that MixtureInference and PublicInference still rely on the old API, not sure how to handle those bits, so the final two cells are still broken

# %%
# SELECT the marginals we'd like to measure
cliques = [('marital-status', 'sex'),
            ('education-num', 'race'),
            ('sex', 'hours-per-week'),
            ('workclass',),
            ('marital-status', 'occupation', 'income>50K')]

# %%
# MEASURE the marginals and log the noisy answers
sigma = 50
measurements = []
for cl in cliques:
    x = data.project(cl).datavector()
    y = x + np.random.normal(loc=0, scale=sigma, size=x.shape)
    I = sparse.eye(x.size)
    measurements.append( LinearMeasurement(y, cl, sigma) )

# %%
# GENERATE synthetic data using Private-PGM
model = estimation.mirror_descent(data.domain, measurements)
synth = model.synthetic_data()

# %%
noisy_error = []
synth_error = []

for measure in measurements:
  y = measure.noisy_measurement
  z = synth.project(measure.clique).datavector()
  x = data.project(measure.clique).datavector()
  noisy_error.append(np.linalg.norm(x-y,1)/data.records)
  synth_error.append(np.linalg.norm(x-z,1)/data.records)
  print(cl, np.linalg.norm(x-y,1)/data.records, np.linalg.norm(x-z,1)/data.records)

import pandas as pd
df = pd.DataFrame({'Noisy Marginals' : noisy_error, 'Synthetic Data': synth_error })
df.index = cliques
df.plot.barh()
plt.legend(fontsize='x-large')
plt.ylabel('$L_1$ Error', fontsize='x-large')

# %%
synth_error = []
cliques2 = [('sex','income>50K'), ('sex', 'workclass'), ('relationship',), ('education-num', 'occupation')]
for cl in cliques2:
  x = data.project(cl).datavector()
  y = synth.project(cl).datavector()
  synth_error.append(np.linalg.norm(x-y,1)/data.records)
  print(cl, np.linalg.norm(x-y,1)/data.records)

df = pd.DataFrame({'Synthetic Data' : synth_error})
df.index = cliques2
df.plot.barh(color='#ff7f0e')
plt.xlim(0,1)
plt.legend(fontsize='x-large')
plt.ylabel('$L_1$ Error', fontsize='x-large')

# %%
data.domain

# %%
# Evaluate the quality of the synthetic data on 2-way marginals
# Try modifying cliques above to see if you can reduce error!
import itertools
import pandas as pd

def score(synth):
  errors = {}
  for cl in itertools.combinations(data.domain, 2):
    true_marginal = data.project(cl).datavector()
    est_marginal = synth.project(cl).datavector()
    errors[cl] = np.linalg.norm(true_marginal-est_marginal, 1) / data.records

  errors = pd.Series(errors).sort_values()

  print('Average Error', errors.mean(), '\n')
  return errors

score(synth)

# %%
public_data = Dataset.synthetic(data.domain, 10000)
# public_data = data # this would clearly be cheating, but try it to see what happens!
engine = PublicInference(public_data)
model = engine.estimate(measurements)
score(model)

# %%
engine = MixtureInference(data.domain, components=100)
model = engine.estimate(measurements)
score(model)

Bain-OS avatar Feb 12 '25 01:02 Bain-OS

Huh, I thought I had updated that colab notebook, maybe wasn't saved. Thanks for flagging, I'll try to look into it

ryan112358 avatar Mar 01 '25 00:03 ryan112358

Same issue using smartnoise-synth 1.0.5 library. Installing through the recommended commit hash is having the issue since this import does not exist anymore.

/usr/local/bin/python3 /Library/Frameworks/Python.framework/Versions/3.1 1/lib/python3.11/site-packages/snsynth/aim/aim.py Please install mbi with: pip install git+https://github.com/ryan112358/private-pgm.git@01f02f17eba440f4e76c1d06fa5ee9eed0bd2bca Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/snsynth/aim/aim.py", line 11, in from mbi import Dataset, FactoredInference, Domain ImportError: cannot import name 'FactoredInference' from 'mbi' (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mbi/init.py)

uyen9vba avatar Aug 07 '25 09:08 uyen9vba

Thanks for opening this issue, i'm not sure I can offer much direct help, but if you find a fix please post it here or update smartnoise-synth documentation. This is the last commit before I started making backwards-incompatible changes

https://github.com/ryan112358/mbi/commit/41b5346afd0bbad2656bf505b1006508f5a232ec

The repository is finally in what I consider a stable state, and can now be installed directly via pypi: pip install mbi=1.0.0. The best long-term solution is probably to update smartnoise-synth to use the new APIs, but that would of course require a bit more work

ryan112358 avatar Aug 08 '25 04:08 ryan112358