zingg icon indicating copy to clipboard operation
zingg copied to clipboard

febrl model match fails in docker

Open sonalgoyal opened this issue 1 year ago • 2 comments

running match for febrl on 0.3.4 release gives an error z_sim18 not found. I suspect that the python model configuration is different from that in config.json - leading to this error. need to investigate further

sonalgoyal avatar Aug 08 '22 16:08 sonalgoyal

@Akash-R-7 can you please check this?

sonalgoyal avatar Aug 08 '22 17:08 sonalgoyal

@sonalgoyal, Problem happening only on docker image, not the local repo. Gives the same error even after similar MATCHTYPE configurations in python file and config.json .

Akash-R-7 avatar Aug 11 '22 08:08 Akash-R-7

Bump on this, just tried the default test run as specified in the README and it doesn't work

UsAndRufus avatar Dec 06 '22 22:12 UsAndRufus

I was able to run it using following steps:

Go to folder /zingg/docker/mac (which contains Dockerfile)

docker image build -t zingg/vikas .

=> docker image zingg/vikas will get formed with tar location specified in Dockerfile => can be seen in docker desktop

now go to /tmp

docker run -v /tmp:/tmp -it zingg/vikas bash

./scripts/zingg.sh --phase findTrainingData --conf examples/febrl/config.json --zinggDir /tmp/z_docker

vikasgupta78 avatar Mar 23 '23 12:03 vikasgupta78

Tried following: docker run -v /tmp:/tmp -it zingg/vikas bash ./scripts/zingg.sh --run examples/febrl/FebrlExample.py

error didn't come (by default FebrlExample.py ran trainMatch)

vikasgupta78 avatar Mar 24 '23 06:03 vikasgupta78

I also ran the phases 1 by 1 by modifying FebrlExample.py (after deleting models/100), issue not reproduced

vikasgupta78 avatar Mar 24 '23 07:03 vikasgupta78

I tried a combo i.e. findTrainingData, label, train using json and match using FebrlExample.py, this was done after deleting models/100. issue not reproduced

vikasgupta78 avatar Mar 24 '23 07:03 vikasgupta78

Finally reproduced using folllowing:

docker run -v /tmp:/tmp -it zingg/vikas bash ./scripts/zingg.sh --phase match --conf examples/febrl/config.json

. Available: z_z_zid, z_zid, fname, lname, stNo, add1, add2, city, areacode, state, dob, ssn, z_source, z_fname, z_lname, z_stNo, z_add1, z_add2, z_city, z_areacode, z_state, z_dob, z_ssn, z_z_source, z_sim0, z_sim1, z_sim2, z_sim3, z_sim4, z_sim5, z_sim6, z_sim7, z_sim8, z_sim9, z_sim10, z_sim11, z_sim12, z_sim13, z_sim14, z_sim15, z_sim16, z_sim17

=> this issue doesn't occur if FebrlExample.py run in match mode

=> indicates a inconsistency in example model shipped with docker in python vs json as running via python not via json . So could be difference in config of both

=> easy fix to run trainMatch instead of match

vikasgupta78 avatar Mar 24 '23 07:03 vikasgupta78

in case of FebrlExample.py all are fuzzy while in config.json stNo , areacode are exact by changing these 2 to fuzzy it worked

vikasgupta78 avatar Mar 24 '23 08:03 vikasgupta78

fixed in commit 687cef2 , pull request #543

generated the model again and change exact to fuzzy in json where there was a difference

vikasgupta78 avatar Mar 24 '23 10:03 vikasgupta78