zingg
zingg copied to clipboard
febrl model match fails in docker
running match for febrl on 0.3.4 release gives an error z_sim18 not found. I suspect that the python model configuration is different from that in config.json - leading to this error. need to investigate further
@Akash-R-7 can you please check this?
@sonalgoyal, Problem happening only on docker image, not the local repo. Gives the same error even after similar MATCHTYPE configurations in python file and config.json .
Bump on this, just tried the default test run as specified in the README and it doesn't work
I was able to run it using following steps:
Go to folder /zingg/docker/mac (which contains Dockerfile)
docker image build -t zingg/vikas .
=> docker image zingg/vikas will get formed with tar location specified in Dockerfile => can be seen in docker desktop
now go to /tmp
docker run -v /tmp:/tmp -it zingg/vikas bash
./scripts/zingg.sh --phase findTrainingData --conf examples/febrl/config.json --zinggDir /tmp/z_docker
Tried following: docker run -v /tmp:/tmp -it zingg/vikas bash ./scripts/zingg.sh --run examples/febrl/FebrlExample.py
error didn't come (by default FebrlExample.py ran trainMatch)
I also ran the phases 1 by 1 by modifying FebrlExample.py (after deleting models/100), issue not reproduced
I tried a combo i.e. findTrainingData, label, train using json and match using FebrlExample.py, this was done after deleting models/100. issue not reproduced
Finally reproduced using folllowing:
docker run -v /tmp:/tmp -it zingg/vikas bash ./scripts/zingg.sh --phase match --conf examples/febrl/config.json
. Available: z_z_zid, z_zid, fname, lname, stNo, add1, add2, city, areacode, state, dob, ssn, z_source, z_fname, z_lname, z_stNo, z_add1, z_add2, z_city, z_areacode, z_state, z_dob, z_ssn, z_z_source, z_sim0, z_sim1, z_sim2, z_sim3, z_sim4, z_sim5, z_sim6, z_sim7, z_sim8, z_sim9, z_sim10, z_sim11, z_sim12, z_sim13, z_sim14, z_sim15, z_sim16, z_sim17
=> this issue doesn't occur if FebrlExample.py run in match mode
=> indicates a inconsistency in example model shipped with docker in python vs json as running via python not via json . So could be difference in config of both
=> easy fix to run trainMatch instead of match
in case of FebrlExample.py all are fuzzy while in config.json stNo , areacode are exact by changing these 2 to fuzzy it worked
fixed in commit 687cef2 , pull request #543
generated the model again and change exact to fuzzy in json where there was a difference