FactoredInference is missing in main
Hey, cool project :)
I was trying to follow along the tutorial here
But the current main branch is missing the FactoredInference class.
Looks like the correct commit to install from is https://github.com/ryan112358/private-pgm/tree/99f287e6fac2d9f7027f28f9c378a9c024b9e653
Here is a partial refactor to make it work with main
FYI this is percent format notebook, you can use jupytext to change it to ipynb with jupytext --to ipynb synthetic_data_example.py
It seems that MixtureInference and PublicInference still rely on the old API, not sure how to handle those bits, so the final two cells are still broken
# %%
# SELECT the marginals we'd like to measure
cliques = [('marital-status', 'sex'),
('education-num', 'race'),
('sex', 'hours-per-week'),
('workclass',),
('marital-status', 'occupation', 'income>50K')]
# %%
# MEASURE the marginals and log the noisy answers
sigma = 50
measurements = []
for cl in cliques:
x = data.project(cl).datavector()
y = x + np.random.normal(loc=0, scale=sigma, size=x.shape)
I = sparse.eye(x.size)
measurements.append( LinearMeasurement(y, cl, sigma) )
# %%
# GENERATE synthetic data using Private-PGM
model = estimation.mirror_descent(data.domain, measurements)
synth = model.synthetic_data()
# %%
noisy_error = []
synth_error = []
for measure in measurements:
y = measure.noisy_measurement
z = synth.project(measure.clique).datavector()
x = data.project(measure.clique).datavector()
noisy_error.append(np.linalg.norm(x-y,1)/data.records)
synth_error.append(np.linalg.norm(x-z,1)/data.records)
print(cl, np.linalg.norm(x-y,1)/data.records, np.linalg.norm(x-z,1)/data.records)
import pandas as pd
df = pd.DataFrame({'Noisy Marginals' : noisy_error, 'Synthetic Data': synth_error })
df.index = cliques
df.plot.barh()
plt.legend(fontsize='x-large')
plt.ylabel('$L_1$ Error', fontsize='x-large')
# %%
synth_error = []
cliques2 = [('sex','income>50K'), ('sex', 'workclass'), ('relationship',), ('education-num', 'occupation')]
for cl in cliques2:
x = data.project(cl).datavector()
y = synth.project(cl).datavector()
synth_error.append(np.linalg.norm(x-y,1)/data.records)
print(cl, np.linalg.norm(x-y,1)/data.records)
df = pd.DataFrame({'Synthetic Data' : synth_error})
df.index = cliques2
df.plot.barh(color='#ff7f0e')
plt.xlim(0,1)
plt.legend(fontsize='x-large')
plt.ylabel('$L_1$ Error', fontsize='x-large')
# %%
data.domain
# %%
# Evaluate the quality of the synthetic data on 2-way marginals
# Try modifying cliques above to see if you can reduce error!
import itertools
import pandas as pd
def score(synth):
errors = {}
for cl in itertools.combinations(data.domain, 2):
true_marginal = data.project(cl).datavector()
est_marginal = synth.project(cl).datavector()
errors[cl] = np.linalg.norm(true_marginal-est_marginal, 1) / data.records
errors = pd.Series(errors).sort_values()
print('Average Error', errors.mean(), '\n')
return errors
score(synth)
# %%
public_data = Dataset.synthetic(data.domain, 10000)
# public_data = data # this would clearly be cheating, but try it to see what happens!
engine = PublicInference(public_data)
model = engine.estimate(measurements)
score(model)
# %%
engine = MixtureInference(data.domain, components=100)
model = engine.estimate(measurements)
score(model)
Huh, I thought I had updated that colab notebook, maybe wasn't saved. Thanks for flagging, I'll try to look into it
Same issue using smartnoise-synth 1.0.5 library. Installing through the recommended commit hash is having the issue since this import does not exist anymore.
/usr/local/bin/python3 /Library/Frameworks/Python.framework/Versions/3.1
1/lib/python3.11/site-packages/snsynth/aim/aim.py
Please install mbi with:
pip install git+https://github.com/ryan112358/private-pgm.git@01f02f17eba440f4e76c1d06fa5ee9eed0bd2bca
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/snsynth/aim/aim.py", line 11, in
Thanks for opening this issue, i'm not sure I can offer much direct help, but if you find a fix please post it here or update smartnoise-synth documentation. This is the last commit before I started making backwards-incompatible changes
https://github.com/ryan112358/mbi/commit/41b5346afd0bbad2656bf505b1006508f5a232ec
The repository is finally in what I consider a stable state, and can now be installed directly via pypi: pip install mbi=1.0.0. The best long-term solution is probably to update smartnoise-synth to use the new APIs, but that would of course require a bit more work