elasticsearch-river-web icon indicating copy to clipboard operation
elasticsearch-river-web copied to clipboard

How to index secured page(via Forms authentication) using Elastic Search service

Open srinivasv2 opened this issue 11 years ago • 2 comments

Hi geeks,

I have a requirement to index secured pages via Forms authentication using elastic search. I have used BASIC authentication feature provided in this plugin which didn't worked for me. Please provide any suggestions.

Thanks, Srinivas V

srinivasv2 avatar May 06 '14 13:05 srinivasv2

To support Form authentication, I think that other ways are needed. If you can not bypass the authentication, for example, one of answers is to use a reverse proxy with authentication, such as HP IceWall SSO(it's not OSS product...). The reverse proxy log in to a site with Form authentication automatically, and then passes the contents to a crawler.

marevol avatar May 06 '14 14:05 marevol

Eventually I will toss this in a public repository but if you're still looking for a solution for this I've made a gist with a small python script I wrote that uses mitmproxy to establish a login session and the appropriate cookies to all requests going through it. Right now I'm just using it to crawl our internal confluence server but eventually I plan to expand it out to work with multiple hostnames and rotating session ids: https://gist.github.com/Fapiko/d3ecfbd58ab156541da9

You'll need to add the mitmproxy ca cert to your java cacerts keystore if you're operating on something that is over SSL.

Fapiko avatar Aug 01 '14 10:08 Fapiko