leakdb
leakdb copied to clipboard
Search getting 0 results
Describe the bug I'm trying to match 3M lines of emails to a 700M lines (roughly 50GB), but after everything is going smoothly and after doing a bunch of tests, I can't get a single match returned, even on the emails/user that I know are in my dataset for sure. All the processes are running on an AWS instance (so I followed the server deployment steps), tried to build from source and use the released version, tried to split my data into smaller files, but still no results. I tried launching the server version and requesting through http request as well.
The really wieird thing is that when I run a search on the test folder you provide, it works properly with your provided indexes. But when I try to regenerate the indexes for small.txt using the doc from the wiki, I'm not getting any results and when I diff my generated index, and the one you provide, they differ, so I'm guessing it has something to do with how the index generation/sorting .
To Reproduce Steps to reproduce the behavior:
-
./leakdb-curator --format colon-newline --recursive --target ./large-folder-containing-all --output normalized.json
-
./leakdb-curator --json normalized.json
-
./leakdb-curator search -i leakdb/email.idx -j leakdb/bloomed.json -v "[email protected]"
Response :Found 0 results ..
-
grep -F "[email protected]" bloomed.json
Response :{"email": "xxx", "user": "xxx", "domain": "gmail.com", "password": "xxx"}
I really wish I could get this to work because it looks amazing, I'm at your disposal for any questions/tests you want me to run.
Enzyro
Hey, just wondering if there are any updates on this issue. Just making my way though the code to see if anything jumps out at me too.
Sorry, not had much time to dig into it been very busy. Lmk if you find something!
@enzyro @flyingdan Did either of you ever find a solution to this? I am running into the same issue using the latest Linux release.