serenata-de-amor
serenata-de-amor copied to clipboard
Analyzes regarding Duplicate Reimbursements
Detecting duplicate Reimbursements using dhash.
The last commit has the notebook to detect duplicate Reimbursements. It uses hash and hamming distance.
The other files concerns future implementation as: CFMT block (Compact Fourier Mellin Transform) to be more precise during the detection.
It is related to issue: #32
Hi @silviodc, thanks for the contribuition!
What I did to test this PR:
- Clone the project:
$ git clone [email protected]:datasciencebr/serenata-de-amor.git
- Change to serenata's folder:
$ cd serenata-de-amor
- Change to @silviodc's branch:
$ git checkout -b silviodc-silvio-cardoso master
$ git pull https://github.com/silviodc/serenata-de-amor.git silvio-cardoso
- The steps to run the project:
$ conda update conda
$ conda create --name serenata_de_amor python=3
$ source activate serenata_de_amor
$ ./setup
- Open the jupyter notebook from the project:
$ jupyter notebook
- Access
http://localhost:8888/notebooks/research/develop/2017-05-05-silvio-Detecting-duplicates.ipynb
I really liked your work on it, looks real impressive! Is there something that you are aiming to do more?
There is only one thing that I'll ask you, and then for me we can merge it!
Is there something that you are aiming to do more?
No. I guess i finished with these analyses.