serenata-de-amor icon indicating copy to clipboard operation
serenata-de-amor copied to clipboard

Analyzes regarding Duplicate Reimbursements

Open silviodc opened this issue 7 years ago • 2 comments

Detecting duplicate Reimbursements using dhash.

The last commit has the notebook to detect duplicate Reimbursements. It uses hash and hamming distance.

The other files concerns future implementation as: CFMT block (Compact Fourier Mellin Transform) to be more precise during the detection.

It is related to issue: #32

silviodc avatar Oct 06 '17 10:10 silviodc

Hi @silviodc, thanks for the contribuition!

What I did to test this PR:

  1. Clone the project:
$ git clone [email protected]:datasciencebr/serenata-de-amor.git
  1. Change to serenata's folder:
$ cd serenata-de-amor 
  1. Change to @silviodc's branch:
$ git checkout -b silviodc-silvio-cardoso master
$ git pull https://github.com/silviodc/serenata-de-amor.git silvio-cardoso
  1. The steps to run the project:
$ conda update conda
$ conda create --name serenata_de_amor python=3
$ source activate serenata_de_amor
$ ./setup
  1. Open the jupyter notebook from the project:
$ jupyter notebook
  1. Access http://localhost:8888/notebooks/research/develop/2017-05-05-silvio-Detecting-duplicates.ipynb

I really liked your work on it, looks real impressive! Is there something that you are aiming to do more?

There is only one thing that I'll ask you, and then for me we can merge it!

anaschwendler avatar Oct 11 '17 21:10 anaschwendler

Is there something that you are aiming to do more?

No. I guess i finished with these analyses.

silviodc avatar Oct 11 '17 22:10 silviodc