check-if-email-exists icon indicating copy to clipboard operation
check-if-email-exists copied to clipboard

HaveIBeenPawned?

Open amaury1093 opened this issue 4 years ago • 6 comments

Add a field smtp.have_i_been_pawned: true/false which makes an API call to https://haveibeenpwned.com/

amaury1093 avatar Apr 17 '20 08:04 amaury1093

There is a small problem: haveibeenpwned's API costs $3.50/month. Maybe consider scraping or a similar free API?

NChechulin avatar Nov 25 '20 10:11 NChechulin

Ah, I wasn't aware it was paid. So maybe not, I don't think it's super high priority (and people can always make a separate API call for that).

I recall the author was open-sourcing it. Will it still be paid after?

amaury1093 avatar Nov 25 '20 10:11 amaury1093

On API Key Page they provide a link to a blog post, which says:

Clearly not everyone will be happy with this so let me spend a bit of time here explaining the rationale. This fee is first and foremost to stop abuse of the API.

So, I think that we should not expect that API will become free soon.

NChechulin avatar Nov 25 '20 10:11 NChechulin

Hello I made my own API. It's free forever! And it works the same as haveibeenpwned.com. I try to make a PR soon. Edit: I am not a rust dev😅

DigitalGreyHat avatar Nov 17 '21 08:11 DigitalGreyHat

@DigitalGreyHat Can you give some/more information about your API?

LeMoussel avatar Dec 07 '21 09:12 LeMoussel

hi, any news? @DigitalGreyHat

olivermontes avatar Mar 10 '22 08:03 olivermontes

Hello, I am currently working on this.

There are my thoughts:

  • The easy way to use HIBP API is to pay for it,
  • we can use a headless browser to bypass cloudflare protection to call https://haveibeenpwned.com/unifiedsearch/[email protected]

The problem with the cloudflare bypass is that we have to rely on a stealth browser. Otherwise cloudflare will be triggered. https://github.com/ultrafunkamsterdam/undetected-chromedriver seems to be the one with the biggest community. I did a PoC and the results are not reliable. It works ~70% of the time (30% of crash/no response). Another problem of the slealth browser is that it brings a lot of new dependencies with its maintainability need.

To my mind, implement the paid API is the way to go. Otherwise we can find another reliable and free API.

sylvain-reynaud avatar Dec 12 '22 08:12 sylvain-reynaud

Let's go with the paid API. @sylvain-reynaud would you like to create a PR?

I think the way to go is:

  • add an env variable RCH_HIBP_API_KEY, if it's set to something non empty, then make an API call
  • put the result in misc.have_i_been_pawned: Option<bool>

amaury1093 avatar Dec 12 '22 09:12 amaury1093

Otherwise we can find another reliable and free API.

Do people know of other free APIs? Ideally open-source. We can always add misc.<other_api> = true/false, and make those extra API calls configurable.

amaury1093 avatar Dec 12 '22 09:12 amaury1093

According to https://github.com/khast3x/h8mail#apis there are 3 free(ium) apis:

  • https://breachdirectory.org/ (50 / month free)
  • https://emailrep.io/ (250 free queries per month, up to 10 queries/day)
  • https://hunter.io/pricing (25 free monthly searches)

sylvain-reynaud avatar Dec 12 '22 09:12 sylvain-reynaud

For information, there is Fingerprint Suite with Playwright. It's OK with Antibot. I didn't test with Cloudflare.

LeMoussel avatar Dec 12 '22 09:12 LeMoussel

It's OK with Antibot. I didn't test with Cloudflare.

const { chromium } = require('playwright');
const { FingerprintGenerator } = require('fingerprint-generator');
const { FingerprintInjector }  = require('fingerprint-injector');

(async () => {
	const fingerprintGenerator = new FingerprintGenerator();

	const browserFingerprintWithHeaders = fingerprintGenerator.getFingerprint({
		devices: ['desktop'],
		browsers: ['chrome'],
	});

	const fingerprintInjector = new FingerprintInjector();
	const { fingerprint } = browserFingerprintWithHeaders;

	const browser = await chromium.launch({ headless: false})

	// With certain properties, we need to inject the props into the context initialization
	const context = await browser.newContext({
		userAgent: fingerprint.userAgent,
		locale: fingerprint.navigator.language,
		viewport: fingerprint.screen,
	});

	// Attach the rest of the fingerprint
	await fingerprintInjector.attachFingerprintToPlaywright(context, browserFingerprintWithHeaders);

	const page = await context.newPage();

	await page.goto('https://haveibeenpwned.com/unifiedsearch/[email protected]');

	// wait for the page to load
	await page.waitForTimeout(20000);
	// log the page content
	console.log(await page.content());
	// screenshot the page
	await page.screenshot({ path: 'proof.png' });
})();

If it runs in headless it is blocked, if it runs with the browser window it is not blocked. You can check it with the code above.

I'll implement the paid API in first place.

sylvain-reynaud avatar Dec 12 '22 12:12 sylvain-reynaud

It seems OK in Firefox headless mode with this:

import path from 'path';
import { fileURLToPath } from 'url';

import { firefox } from 'playwright';
import { FingerprintGenerator } from 'fingerprint-generator';
import { FingerprintInjector } from 'fingerprint-injector';

(async () => {
    const fingerprintGenerator = new FingerprintGenerator();

    const browserFingerprintWithHeaders = fingerprintGenerator.getFingerprint({
        devices: ['desktop'],
        browsers: ['firefox'],
    });

    const fingerprintInjector = new FingerprintInjector();
    const { fingerprint } = browserFingerprintWithHeaders;

    const browser = await firefox.launch({
        headless: true
    });

    // With certain properties, we need to inject the props into the context initialization
    const context = await browser.newContext({
        userAgent: fingerprint.userAgent,
        locale: fingerprint.navigator.language,
        viewport: fingerprint.screen,
    });

    // Attach the rest of the fingerprint
    await fingerprintInjector.attachFingerprintToPlaywright(context, browserFingerprintWithHeaders);

    const page = await context.newPage();

    await page.goto('https://haveibeenpwned.com/unifiedsearch/[email protected]');

    await page.screenshot({ path: path.join(path.dirname(fileURLToPath(import.meta.url)), 'playwright_test_headless.png') });

    await browser.close()
})();

LeMoussel avatar Dec 12 '22 14:12 LeMoussel

Yep! It's OK with got-scraping got-scraping library has usually better success than other libraries due to header generation, http2 and browser ciphers.

import { gotScraping } from 'got-scraping';

(async () => {
    const response = await gotScraping({
        url: 'https://haveibeenpwned.com/unifiedsearch/[email protected]',
        headerGeneratorOptions:{
            browsers: ['firefox'],
            devices: ['desktop'],
        }
    });
    console.log(response.body)
    const result = JSON.parse(response.body)
    console.log(`Response headers: ${JSON.stringify(response.headers)}`);
})();

LeMoussel avatar Dec 12 '22 15:12 LeMoussel

@LeMoussel wow I didn't know about this package, thank's :100:

So I'm working on adding the feature by calling this URL https://haveibeenpwned.com/unifiedsearch/[email protected]

sylvain-reynaud avatar Dec 12 '22 22:12 sylvain-reynaud

Hello, my PR is ready to be reviewed :)

sylvain-reynaud avatar Jan 11 '23 19:01 sylvain-reynaud

What's is this API?

On Wed, Jan 11, 2023, 9:19 PM Sylvain Reynaud @.***> wrote:

Hello, my PR is ready to be reviewed :)

— Reply to this email directly, view it on GitHub https://github.com/reacherhq/check-if-email-exists/issues/289#issuecomment-1379370425, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDLT22QKOYZMRJJY3ZSVTLWR4BVPANCNFSM4MKR473Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

beshoo avatar Jan 11 '23 19:01 beshoo

I fixed the format and removed code that might break if a field is added on the API Response.

@beshoo it uses the haveibeenpwned API. The endpoint used is the one used by the front-end haveibeenpwned.com.

sylvain-reynaud avatar Jan 13 '23 10:01 sylvain-reynaud

The node.js libraries are probably more battle-tested, but I would like to keep this repo as pure Rust.

Also, I'm reluctant to use a headless browser for HIBP. It seems there's a risk that it'll become flaky/blocked one day, and the maintenance burden will likely fall on me. I propose to start with the paid API, as descrbied in https://github.com/reacherhq/check-if-email-exists/issues/289#issuecomment-1346149442. I'll gladly purchase the paid API and make it available on https://reacher.email 's SAAS plan.

amaury1093 avatar Jan 16 '23 10:01 amaury1093

Implemented in #1253, closing, thanks @sylvain-reynaud

amaury1093 avatar Oct 07 '23 11:10 amaury1093