athens icon indicating copy to clipboard operation
athens copied to clipboard

Using Athens as Checksum DB (not just a proxy) for private repos?

Open jhollowayj opened this issue 4 years ago • 4 comments

The title suggests a that this is a feature request, but maybe it's more of a documentation item stemming from me not understanding something...

I've been looking into setting up an Athens proxy for both public and private repos, but I was a little confused about the proper use of ATHENS_GONOSUM_PATTERNS and GONOSUMDB in your documentation on proxying checksum requests.

After reading through that documentation, here are the main questions I have:

  1. Is there a need for checksums on public repos that I have control over?
  2. Is there a need for checksums on private repos?
    a. What are the dangers of not checksumming the public repos under my control? b. When is there a need for checksums on a private repo? c. Are there any problems that could stem from a two parties using the same tag but having different checksums because one had turn off checksums?
  3. When I have a mix of public and private repos under the same name space (i.e. github.com/company/*), is there a good way to (1) allow checksums in public repos and (2) disallow checksums for the private repos without managing a list of private/public repos? a. I don't want to have to (1) manage that list across Athens and developers machines, (2) update Athens every time I add a new private repo (because I'll forget), (3) field questions from individuals why they are getting 410 errors when they want to get the new private repo.
  4. Would it make sense to use Athens as a private checksum database for my own private repos? a. Are there consequences with not being in sync with the global checksum db if I accidentally configure it to also checksum public repos?

Action items

I see this becoming either a feature request to add checksum database functionality to Athens for private repos and all the problems that may come with that OR updating documentation to explain the "ideal" way to handle checksums with public/private stuff, explaining the trade-offs of various configurations, etc. Maybe have something explaining how to manage the list of private repos in an efficient/scaleable manner.

jhollowayj avatar Mar 14 '20 20:03 jhollowayj

Thanks for your questions @jhollowayj! I'll answer them inline:

Is there a need for checksums on public repos that I have control over?

There isn't a strict need. In fact, there's no strict need for checksums on any public repo. The idea is to prove that nothing has been changed since the checksum server first saw it. For your own public repos, you have control over what gets changed.

Is there a need for checksums on private repos?

Another matter of opinion, but you most likely have tighter control over who can push code (or otherwise change) to your private repos, so for me, there's no need.

What are the dangers of not checksumming the public repos under my control?

I think I answered this one. Let me know if I didn't and I'll elaborate

When is there a need for checksums on a private repo?

Same here, happy to elaborate

Are there any problems that could stem from a two parties using the same tag but having different checksums because one had turn off checksums?

Let me start this answer with some context. Turning off checksums is a confusing term - it should probably be called checksum verification. We'll need to fix that in the docs.

Anyway, turning off verification means that the Go tool won't compare the checksum it generates with a global checksum database. This database defaults to sum.golang.org. It only does that if the module@tag wasn't in the go.sum file - in other words if you're adding a new module@tag to your project.

With that in mind, whichever party introduces mymodule@v1, for example, will write that checksum to go.sum, even if they have checksums turned off. If party 2 comes along and needs to download the code again (because they didn't have it on their local machine), that could fail their build if the code had changed since party 1 first added it. That is a problem solved by Athens and other proxies though. It's not directly related to the checksum database.

When I have a mix of public and private repos under the same name space (i.e. github.com/company/*), is there a good way to (1) allow checksums in public repos and (2) disallow checksums for the private repos without managing a list of private/public repos?

I don't want to have to (1) manage that list across Athens and developers machines, (2) update Athens every time I add a new private repo (because I'll forget), (3) field questions from individuals why they are getting 410 errors when they want to get the new private repo.

On the Athens side, unfortunately not unless you turn off checksum verification for *. Again, this means that Athens will not verify checksums in the global checksum database when it downloads a new module for the first time. After that, whatever it downloads is in its storage backend and Athens serves it to your developer team forever.

Given that behavior, it's ok to tell them to turn off checksum verification if they all can trust the Athens server and you trust that Athens has downloaded the "right" code ("right" depends on your tolerance for code changing since sum.golang.org first saw it). They most likely can if you're serving over https and within a private network, which you probably are since you're serving private code.

Would it make sense to use Athens as a private checksum database for my own private repos?

~There's~ There would be no way to interleave Athens-generated checksums for your private modules with sum.golang.org's checksums for public modules. The other option would be to have Athens store and serve its own complete checksum database, which it doesn't do right now. We haven't implemented that because we assume that you'll trust all the checksums that come from your private modules, since you know all the developers that changed the code

Are there consequences with not being in sync with the global checksum db if I accidentally configure it to also checksum public repos?

Since we could not combine private checksums with sum.golang.org-generated checksums, being out of sync with the global checksum database is impossible. It is either turned on or off in Athens.

I see this becoming either a feature request to add checksum database functionality to Athens for private repos and all the problems that may come with that OR updating documentation to explain the "ideal" way to handle checksums with public/private stuff, explaining the trade-offs of various configurations, etc. Maybe have something explaining how to manage the list of private repos in an efficient/scaleable manner.

Unfortunately, implementing a checksum DB in Athens means doing it for public modules too. That adds a lot of complexity in code, Athens configuration, and on developer machines (more than now). We currently assume that folks don't need checksum verification for private modules, so we haven't done the implementation.

I hope I've answered your questions well enough! If so, I'll add them to the FAQ and try to improve the docs as you'd suggested.

arschles avatar Mar 26 '20 21:03 arschles

@arschles, thank you so much for the detailed response. I'll run this response by one of my coworkers and come back with any questions they might have as well.

jhollowayj avatar Mar 31 '20 03:03 jhollowayj

@jhollowayj you got it! I also want to convey a quick point that @marwan-at-work told me on chat. He gave a really good analogy of what a proxy that also hosts a checksum DB would be. Paraphrased:

A checksum DB is a watchdog over a proxy server. If Athens were to also host a checksum DB, that would be like asking a thief to be a police officer as well.

I thought this was a really, really good way to describe the situation. I hope it helps. And thanks @marwan-at-work !

arschles avatar Apr 01 '20 22:04 arschles

Sorry for the delay @arschles. 2020, am I right? :D

I think we've cleared up misunderstandings on our side. Sounds like using Athens as a Checksum DB does not make sense. So I'm happy for this issue to be closed. Is there any need to keep this issue open on your end? Should there be a new issue created for updating documentation to include any of the things explained here?

I'm happy to push responsibility of closing this issue and creating any other issues off to you, but please let me know if there is anything you would like me to do to help in that process.

jhollowayj avatar Jun 19 '20 23:06 jhollowayj