git-dumper
git-dumper copied to clipboard
Git-dumper doesn't work in some cases when the git output have HTML content-type
- I found a public git folder on some website. but during using
git-dumper
to dump the code out from the git folder i got these errors:
[-] Testing https://example.com/.git/HEAD [200]
[-] https://example.com//.git/HEAD responded with HTML
- I checked the website manually and I can clearly see the git folder content is leaked. but git-dumper refuses to dump it since the data comming out from it is in HTML content-type. that will disallow git-dumper from dumping some cases.
I think originally I was only checking whether the content contains "" but people had issues with that, see https://github.com/arthaud/git-dumper/pull/13
@DashLt do you know what was the issue with the original check?
In the meantime you can replace line 33 of git_dumper.py with a return False
.
Yeah I already edited that line of code before. but the issue was still there. then i noticed there's a second layer of validation on line 73 do the same thing as 33. edited it and now it's working for me.
Not every site has a <html>
tag verbatim. Many have attributes inside the tag, e.g.:
<html class="rwd geo-override no-js vis no-rtl headerfooter-menu3 " lang="en">
It's weird that whatever webserver in the site you're attacking isn't using the application/octet-stream content-type, but it exists so it's definitely an edge case that has to be handled. As a quick and dirty thing you could check for the existence of <html
, but even then that tag isn't necessarily required. I think maybe some sort of HEAD file validation is in order?
That's also my conclusion. We would need a reference syntax checker. or we could just skip the verification on that file and fail later when we parse objects file (which need to be compressed with zlib, so that rules out html).
Not every site has a tag verbatim. Many have attributes inside the tag, e.g.:
You can solve this with regex, Pattern: \<html(|.*)\>
If you gonna accept the RE solution, I can do the fixes on PR if you would like.
You can solve this with regex, Pattern:
\<html(|.*)\>
https://stackoverflow.com/a/1732454
(In all seriousness, running a regex that matches that much could cause serious slowdowns on pages that can easily reach the hundreds of KB or even MB. You would also be able to send git-dumper back a very large page and make it hang as well. It's in general just a very hacky solution.)
You seems to be right, but I guess in this case we don't really need that HTML content-type validation if we already know that it contains a content from the GIT folder. for example checking a string on /.git/config
will be more than fine to keep fetching other stuff without caring about content-type.