anonymous_github icon indicating copy to clipboard operation
anonymous_github copied to clipboard

Cloning from anonymous github

Open suyash67 opened this issue 4 years ago • 35 comments

How do we clone a repository from anonymous github? Please help me out.

suyash67 avatar Aug 31 '19 19:08 suyash67

it is not possible, sorry.

tdurieux avatar Sep 01 '19 05:09 tdurieux

@tdurieux Thank you for the amazing work.

Wondering if is there any interest in supporting cloning from an anonymous repository? One use case is where we want to upload our code in a zip format after anonymization.

TheShadow29 avatar Nov 21 '19 01:11 TheShadow29

I m considering it but it would require a complete reimplementation of the service. I need time and I don’t have a lot :/

tdurieux avatar Nov 21 '19 03:11 tdurieux

hello, @TheShadow29, @tdurieux .

I have finished a simple project to download the repository from anonymous4open. https://github.com/ShoufaChen/clone-anonymous4open

This implementation is not very elegant now but works properly. Hope it will be helpful.

ShoufaChen avatar Nov 25 '19 01:11 ShoufaChen

@tdurieux Thank you for the amazing work.

Wondering if is there any interest in supporting cloning from an anonymous repository? One use case is where we want to upload our code in a zip format after anonymization.

Do you mean "download" instead of "upload"? This would really be a useful thing. For now, a simple workaround is to add a zip file to the repo itself for submission. But this won't have the "XXX" feature of course.

real-or-random avatar Jan 20 '20 10:01 real-or-random

Thank you for the great tool! Unfortunately without any possibility for reviewers to download files, in most cases it cannot be used for paper submissions.

johentsch avatar Jun 10 '20 12:06 johentsch

I understand I am working on a new version that will allow this. But will have some restrictions in terms of project size (I don't have an infinite storage on my server).

tdurieux avatar Jun 10 '20 12:06 tdurieux

If you are unable to provide any kind of git clone or archive download due to lack of time, please at least don't go out of your way to detect and block downloading entirely!

 $ wget -m 'https://anonymous.4open.science/r/5c1049ba-109f-4a12-9d74-9a4a5130ce97/'
--2020-11-17 12:46:23--  https://anonymous.4open.science/r/5c1049ba-109f-4a12-9d74-9a4a5130ce97/
Resolving anonymous.4open.science (anonymous.4open.science)... 104.31.90.163, 172.67.183.76, 104.31.91.163, ...
Connecting to anonymous.4open.science (anonymous.4open.science)|104.31.90.163|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2020-11-17 12:46:23 ERROR 403: Forbidden.

gwern avatar Nov 17 '20 17:11 gwern

I did nothing about that.

tdurieux avatar Nov 17 '20 17:11 tdurieux

It is your server blocking requests, and even if you did not explicitly enable it, that does users no good. How exactly are we supposed to use the anonymous repos to do any kind of peer review or test things out when there is no download functionality and your server blocks the usual last resorts of crawling/scraping? I did not expect to have to fight DRM and pull out my old notes on cookies and HTTP headers just to download a repo.

gwern avatar Nov 17 '20 18:11 gwern

This is not the usage of this service. It is meant to be a service to visualize the anonymized GitHub repo not to download it. I am sorry that you would like other features but with the current implementation, it is not possible for several technical reasons.

Feel free to create your own anonymize service, pay and maintain it for a year. I would be happy to use it.

tdurieux avatar Nov 17 '20 18:11 tdurieux

I wouldn't because anonymized peer review is deeply flawed and has been repeatedly experimentally shown to be highly unreliable and largely useless compared to much better methodologies like pre-registered reports (and that's without adding in silliness like blocking downloads and making excuses for it).

For everyone else, the best approach I have found so far is using https://github.com/ShoufaChen/clone-anonymous4open to download repos and bypass the web interface. It has some bugs which I've filed fixes on (specifically, too-large files & spaces in filenames).

gwern avatar Nov 17 '20 18:11 gwern

It is your server blocking requests,

No, the server does not block anything. But maybe Cloudflare does so.

https://github.com/ShoufaChen/clone-anonymous4open

Interesting, thanks for sharing.

How exactly are we supposed to use the anonymous repos to do any kind of peer review

I'd suggest to ask your PC chair about a better solution.

monperrus avatar Nov 17 '20 20:11 monperrus

It is your server blocking requests, and even if you did not explicitly enable it, that does users no good. How exactly are we supposed to use the anonymous repos to do any kind of peer review or test things out when there is no download functionality and your server blocks the usual last resorts of crawling/scraping? I did not expect to have to fight DRM and pull out my old notes on cookies and HTTP headers just to download a repo.

The reproachful tone is out of place in thr context of open source where developers give away work they did in their free time. If you have to blind review the repo, ask the editor to provide the anonymized data as a ZIP because you require it to properly review and that's that.

johentsch avatar Nov 18 '20 12:11 johentsch

A reproachful tone is perfectly appropriate when dealing with a maintainer whose response to a serious bug is to deflect and give the most narrowly technically-correct (surely the best possible way of being correct...) answer of "I did nothing about that", without investigating what the problem is, which does nothing to solve this problem and serves only to deflect blame onto the (many) users who are having this problem of being unable to download repos, as you also are doing in your comment by coming up with reasons for everyone but anonymous_github to do things to work around its problems.

gwern avatar Nov 18 '20 15:11 gwern

You can be angry at anyone you want. It won't necessarily convince anyone to help you. Personally, I wouldn't move a finger for someone sporting such a tone when they are actually the one who wants something.

johentsch avatar Nov 18 '20 15:11 johentsch

Further user-blaming and tone-policing to shift the blame to anyone but oneself. The bug remains.

gwern avatar Nov 18 '20 16:11 gwern

This is still an issue. Would appreciate workaround if not solution. I'm having difficulty downloading large files even without clone tools using web interface. I think this is somewhat in conflict with the mission of open reproducibility.

davidgabriel42 avatar Mar 16 '21 21:03 davidgabriel42

Once the paper is published the paper will contain the GitHub link. The open reproducibility is therefore not impacted. I agree that it would be nice to be able to download the anonymized repository, unfortunately it would not be sustainable for me. It would increase drastically the cost of the service. I put a donation button recently as an experiment to see if donation could cover the cost of this feature but I received 0 donation. Since I m not ready to spend more of my personal money, this feature will unfortunately not be available.

tdurieux avatar Mar 17 '21 07:03 tdurieux

can you imagine judging a cooking contest by only reading the ingredients? 😋

the anonymous_github idea initially seems useful but without the ability to access a copy of the code, its less useful than just distributing a zip. you should put a disclaimer indicating that reviewers will not be able to download the code

thanks for your consideration

kumavis avatar May 23 '21 09:05 kumavis

Like I said in the issue, it is just a question of money this service already costs me overall $2000. I am not willing to spend more money per month on this service. Adding, this feature will increase the storage and compute required. Do a donation of $250 and you will sponsor this feature for a year. I will be happy to support it but not financially.

It is not just streaming the zip from GitHub... You need to anonymize it and there are GitHub API rate limit therefore I have to cache everything. It is not a problem for small repositories but for repositories that have gigabyte of data it is. Those repositories are more frequent than you think.

Also, it is clearly written in the FAQ that the repos are not downloadable.

tdurieux avatar May 23 '21 09:05 tdurieux

Hi @tdurieux. Thank you for this great JOB!

I really appreciate your effort in maintaining this project and I totally agree with you regarding @gwern's behavior.

I read all the documentation and FAQ to make sure I fully understand the purpose of the project.

I'm not sure if I was looking for it right, but I didn't find any licenses in the documentation. Maybe including a GNU LINCENSE is a good idea to avoid some dissatisfactions from some comrades.

To be honest, it would be great have this feature, but i understand all the issues to implement it.

Best regards

elir0d avatar May 29 '21 21:05 elir0d

Thanks @vanpyre.

I'm not sure if I was looking for it right, but I didn't find any licenses in the documentation. Maybe including a GNU LINCENSE is a good idea to avoid some dissatisfactions from some comrades.

Thanks for the reminder I forgot about it. I was planning to using MIT, but GNU is probably a better option. I will do that right now.

To be honest, it would be great have this feature, but i understand all the issues to implement it.

Implementing the feature is relatively easy, 2-3h my issue is really what would be the impact of this feature. I will probably implement it soon but I will not activate the feature if someone wants to deploy a new instance it would be available. I really hope that some conferences could deploy their instance.

tdurieux avatar May 30 '21 15:05 tdurieux

Hi, adding a zip file shows "The file type is not supported. Anonymous GitHub cannot handle it." Is there some work around to this?

Matheus-Garbelini avatar Jun 09 '21 09:06 Matheus-Garbelini

Zip is a binary, I cannot anonymize it. I am not sure what I can do.

tdurieux avatar Jun 09 '21 09:06 tdurieux

This is such a useful service. Thank you for developing it. I do think that the ability to easily download anonymized repositories is the only critical feature missing. For blind review, simply zipping the current commit would be totally adequate. If it's a resources issue, I wonder if one of the large conferences that uses double-blind review (e.g. CogSci) would be up for funding it. Have you talked to organizations about sponsoring this fantastic project?

daeh avatar Feb 10 '22 17:02 daeh

It is now possible to download repositories that are smaller than 150kb. You will find a button at the top right of the page.

tdurieux avatar Oct 04 '22 11:10 tdurieux

It's totally possible to download a repository as a zip archive with the api, and I've created a simple python script to do this: https://gist.github.com/12f23eddde/afaf4f6af4608f42b3721cef5e26260a Guess downloading and compression can be done in the browser with libraries like zip.js?

hrz6976 avatar Oct 06 '22 06:10 hrz6976

Possible but this costs me a lot, thus using your script will put the website finance (my pocket money) in trouble. Don't abuse otherwise I will need to take measures...

tdurieux avatar Oct 06 '22 06:10 tdurieux

@tdurieux You wrote above that it is now possible to download repositories of up to 150kb. Does that mean the feature is in general implemented and can be activated also for larger repositories, if we deploy the service on our own server?

Also: Thanks a lot for implementing :).

alexandertornede avatar Oct 18 '22 06:10 alexandertornede