gau icon indicating copy to clipboard operation
gau copied to clipboard

Regarding feat: implement unthrottled concurrency using task queue

Open wumpus opened this issue 1 year ago • 10 comments

Can you stop attacking the Common Crawl CDX API?

wumpus avatar Jun 09 '24 09:06 wumpus

I’m not? This is an open source tool to find archived URLs for a given domain…

lc avatar Jun 09 '24 12:06 lc

Yes, and because it isn't throttled, use of this package harms the target, which is me.

wumpus avatar Jun 09 '24 18:06 wumpus

Any progress? I was hoping for rate limiting, honoring 503 and 429 status codes, and exponential backoff.

And not just "unthrottled concurrency".

wumpus avatar Jun 11 '24 20:06 wumpus

It’s open source, so PR's are welcome.

It is going to be a busy month with some life changes for me – I will put this in my TODO's. Unfortunately will likely not get done until late June or early July

lc avatar Jun 11 '24 21:06 lc

Accidentally closed when commenting

lc avatar Jun 11 '24 21:06 lc

Thanks for adding to your TODO list, I appreciate it!

Here's an example of making a single query in Athena that's much more efficient than gau: https://positive.security/blog/ransack-data-exfiltration#common-crawl

wumpus avatar Jun 13 '24 20:06 wumpus

Thanks for the reference & sorry about the slowness to implement. Getting hitched!

lc avatar Jun 13 '24 23:06 lc

Congratulations!

wumpus avatar Jun 16 '24 23:06 wumpus

Any update on this?

mr-pmillz avatar Dec 29 '24 00:12 mr-pmillz

I have pretty vicious rate limits on the API now, so I expect that this software is broken.

wumpus avatar Dec 29 '24 02:12 wumpus