git-dumper icon indicating copy to clipboard operation
git-dumper copied to clipboard

Git-dumper doesn't work in some cases when the git output have HTML content-type

Open DEMON1A opened this issue 3 years ago • 8 comments

  • I found a public git folder on some website. but during using git-dumper to dump the code out from the git folder i got these errors:
[-] Testing https://example.com/.git/HEAD [200]
[-] https://example.com//.git/HEAD responded with HTML
  • I checked the website manually and I can clearly see the git folder content is leaked. but git-dumper refuses to dump it since the data comming out from it is in HTML content-type. that will disallow git-dumper from dumping some cases.

DEMON1A avatar May 15 '21 03:05 DEMON1A

I think originally I was only checking whether the content contains "" but people had issues with that, see https://github.com/arthaud/git-dumper/pull/13 @DashLt do you know what was the issue with the original check? In the meantime you can replace line 33 of git_dumper.py with a return False.

arthaud avatar May 15 '21 15:05 arthaud

Yeah I already edited that line of code before. but the issue was still there. then i noticed there's a second layer of validation on line 73 do the same thing as 33. edited it and now it's working for me.

DEMON1A avatar May 15 '21 15:05 DEMON1A

Not every site has a <html> tag verbatim. Many have attributes inside the tag, e.g.:

<html class="rwd geo-override no-js vis no-rtl headerfooter-menu3 " lang="en">

It's weird that whatever webserver in the site you're attacking isn't using the application/octet-stream content-type, but it exists so it's definitely an edge case that has to be handled. As a quick and dirty thing you could check for the existence of <html, but even then that tag isn't necessarily required. I think maybe some sort of HEAD file validation is in order?

DashLt avatar May 15 '21 17:05 DashLt

That's also my conclusion. We would need a reference syntax checker. or we could just skip the verification on that file and fail later when we parse objects file (which need to be compressed with zlib, so that rules out html).

arthaud avatar May 16 '21 04:05 arthaud

Not every site has a tag verbatim. Many have attributes inside the tag, e.g.:

You can solve this with regex, Pattern: \<html(|.*)\>

DEMON1A avatar May 16 '21 06:05 DEMON1A

If you gonna accept the RE solution, I can do the fixes on PR if you would like.

DEMON1A avatar May 16 '21 08:05 DEMON1A

You can solve this with regex, Pattern: \<html(|.*)\>

https://stackoverflow.com/a/1732454

(In all seriousness, running a regex that matches that much could cause serious slowdowns on pages that can easily reach the hundreds of KB or even MB. You would also be able to send git-dumper back a very large page and make it hang as well. It's in general just a very hacky solution.)

DashLt avatar May 16 '21 13:05 DashLt

You seems to be right, but I guess in this case we don't really need that HTML content-type validation if we already know that it contains a content from the GIT folder. for example checking a string on /.git/config will be more than fine to keep fetching other stuff without caring about content-type.

DEMON1A avatar May 16 '21 15:05 DEMON1A