check-if-email-exists
check-if-email-exists copied to clipboard
HaveIBeenPawned?
Add a field smtp.have_i_been_pawned: true/false
which makes an API call to https://haveibeenpwned.com/
There is a small problem: haveibeenpwned's API costs $3.50/month. Maybe consider scraping or a similar free API?
Ah, I wasn't aware it was paid. So maybe not, I don't think it's super high priority (and people can always make a separate API call for that).
I recall the author was open-sourcing it. Will it still be paid after?
On API Key Page they provide a link to a blog post, which says:
Clearly not everyone will be happy with this so let me spend a bit of time here explaining the rationale. This fee is first and foremost to stop abuse of the API.
So, I think that we should not expect that API will become free soon.
Hello I made my own API. It's free forever! And it works the same as haveibeenpwned.com. I try to make a PR soon. Edit: I am not a rust dev😅
@DigitalGreyHat Can you give some/more information about your API?
hi, any news? @DigitalGreyHat
Hello, I am currently working on this.
There are my thoughts:
- The easy way to use HIBP API is to pay for it,
- we can use a headless browser to bypass cloudflare protection to call https://haveibeenpwned.com/unifiedsearch/[email protected]
The problem with the cloudflare bypass is that we have to rely on a stealth browser. Otherwise cloudflare will be triggered. https://github.com/ultrafunkamsterdam/undetected-chromedriver seems to be the one with the biggest community. I did a PoC and the results are not reliable. It works ~70% of the time (30% of crash/no response). Another problem of the slealth browser is that it brings a lot of new dependencies with its maintainability need.
To my mind, implement the paid API is the way to go. Otherwise we can find another reliable and free API.
Let's go with the paid API. @sylvain-reynaud would you like to create a PR?
I think the way to go is:
- add an env variable
RCH_HIBP_API_KEY
, if it's set to something non empty, then make an API call - put the result in
misc.have_i_been_pawned: Option<bool>
Otherwise we can find another reliable and free API.
Do people know of other free APIs? Ideally open-source. We can always add misc.<other_api> = true/false
, and make those extra API calls configurable.
According to https://github.com/khast3x/h8mail#apis there are 3 free(ium) apis:
- https://breachdirectory.org/ (50 / month free)
- https://emailrep.io/ (250 free queries per month, up to 10 queries/day)
- https://hunter.io/pricing (25 free monthly searches)
For information, there is Fingerprint Suite with Playwright. It's OK with Antibot. I didn't test with Cloudflare.
It's OK with Antibot. I didn't test with Cloudflare.
const { chromium } = require('playwright');
const { FingerprintGenerator } = require('fingerprint-generator');
const { FingerprintInjector } = require('fingerprint-injector');
(async () => {
const fingerprintGenerator = new FingerprintGenerator();
const browserFingerprintWithHeaders = fingerprintGenerator.getFingerprint({
devices: ['desktop'],
browsers: ['chrome'],
});
const fingerprintInjector = new FingerprintInjector();
const { fingerprint } = browserFingerprintWithHeaders;
const browser = await chromium.launch({ headless: false})
// With certain properties, we need to inject the props into the context initialization
const context = await browser.newContext({
userAgent: fingerprint.userAgent,
locale: fingerprint.navigator.language,
viewport: fingerprint.screen,
});
// Attach the rest of the fingerprint
await fingerprintInjector.attachFingerprintToPlaywright(context, browserFingerprintWithHeaders);
const page = await context.newPage();
await page.goto('https://haveibeenpwned.com/unifiedsearch/[email protected]');
// wait for the page to load
await page.waitForTimeout(20000);
// log the page content
console.log(await page.content());
// screenshot the page
await page.screenshot({ path: 'proof.png' });
})();
If it runs in headless it is blocked, if it runs with the browser window it is not blocked. You can check it with the code above.
I'll implement the paid API in first place.
It seems OK in Firefox headless mode with this:
import path from 'path';
import { fileURLToPath } from 'url';
import { firefox } from 'playwright';
import { FingerprintGenerator } from 'fingerprint-generator';
import { FingerprintInjector } from 'fingerprint-injector';
(async () => {
const fingerprintGenerator = new FingerprintGenerator();
const browserFingerprintWithHeaders = fingerprintGenerator.getFingerprint({
devices: ['desktop'],
browsers: ['firefox'],
});
const fingerprintInjector = new FingerprintInjector();
const { fingerprint } = browserFingerprintWithHeaders;
const browser = await firefox.launch({
headless: true
});
// With certain properties, we need to inject the props into the context initialization
const context = await browser.newContext({
userAgent: fingerprint.userAgent,
locale: fingerprint.navigator.language,
viewport: fingerprint.screen,
});
// Attach the rest of the fingerprint
await fingerprintInjector.attachFingerprintToPlaywright(context, browserFingerprintWithHeaders);
const page = await context.newPage();
await page.goto('https://haveibeenpwned.com/unifiedsearch/[email protected]');
await page.screenshot({ path: path.join(path.dirname(fileURLToPath(import.meta.url)), 'playwright_test_headless.png') });
await browser.close()
})();
Yep! It's OK with got-scraping
got-scraping
library has usually better success than other libraries due to header generation, http2 and browser ciphers.
import { gotScraping } from 'got-scraping';
(async () => {
const response = await gotScraping({
url: 'https://haveibeenpwned.com/unifiedsearch/[email protected]',
headerGeneratorOptions:{
browsers: ['firefox'],
devices: ['desktop'],
}
});
console.log(response.body)
const result = JSON.parse(response.body)
console.log(`Response headers: ${JSON.stringify(response.headers)}`);
})();
@LeMoussel wow I didn't know about this package, thank's :100:
So I'm working on adding the feature by calling this URL https://haveibeenpwned.com/unifiedsearch/[email protected]
Hello, my PR is ready to be reviewed :)
What's is this API?
On Wed, Jan 11, 2023, 9:19 PM Sylvain Reynaud @.***> wrote:
Hello, my PR is ready to be reviewed :)
— Reply to this email directly, view it on GitHub https://github.com/reacherhq/check-if-email-exists/issues/289#issuecomment-1379370425, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDLT22QKOYZMRJJY3ZSVTLWR4BVPANCNFSM4MKR473Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I fixed the format and removed code that might break if a field is added on the API Response.
@beshoo it uses the haveibeenpwned API. The endpoint used is the one used by the front-end haveibeenpwned.com.
The node.js libraries are probably more battle-tested, but I would like to keep this repo as pure Rust.
Also, I'm reluctant to use a headless browser for HIBP. It seems there's a risk that it'll become flaky/blocked one day, and the maintenance burden will likely fall on me. I propose to start with the paid API, as descrbied in https://github.com/reacherhq/check-if-email-exists/issues/289#issuecomment-1346149442. I'll gladly purchase the paid API and make it available on https://reacher.email 's SAAS plan.
Implemented in #1253, closing, thanks @sylvain-reynaud