thanks
thanks copied to clipboard
update-commit-db _might_ miss commits
It only looks at the first page of commits. If we get very busy, it might not catch all of them.
This should be alleviated once a release happens, but master will be missing in the meantime. So ideally it should go back enough to not miss anything.
I'll pick this up. Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database. Quick question though, what is the purpose of the visible
field in the Releases table?
Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database
Exactly.
Quick question though, what is the purpose of the visible field in the Releases table?
In order to write the release blog post, I need to know how many people have contributed. In order to do that, I have to create the release. But then it's up before the actual announcement. So, basically, a new release will sit as false
for a few days, and then set to true
on release day.
Given that this process is only done with the master
release, which will always be true
, you shouldn't have to worry about it :)
Having hacked on it a little, there's two ways I can see implementing this. The first assumes that commits are inserted in order which is a hacky assumption BUT requires very little change to the current code. The second would add a date column to the commits table in order to set a sort order for commits but is a much more involved change. Thoughts? Sorry for the terrible formatting
On Apr 3, 2017 10:21 AM, "Steve Klabnik" [email protected] wrote:
Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database
Exactly.
Quick question though, what is the purpose of the visible field in the Releases table?
In order to write the release blog post, I need to know how many people have contributed. In order to do that, I have to create the release. But then it's up before the actual announcement. So, basically, a new release will sit as false for a few days, and then set to true on release day.
Given that this process is only done with the master release, which will always be true, you shouldn't have to worry about it :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/thanks/issues/15#issuecomment-291157954, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwkvoSsPdyahjv6R1HqHODi8R7SKIASks5rsQBngaJpZM4LiSko .
Thoughts?
Can you elaborate a bit more at the high level of how this works? That is, I'm not sure what adding the dates would get you, exactly.
We need a way to say "what was the last commit saved for the latest non master release". We could assume that commits are inserted in order which means we wouldn't need dates, or we could use the date_approved field. Unless there is some other mechanism for ordering commits that I'm not aware of
On Apr 3, 2017 11:31 AM, "Steve Klabnik" [email protected] wrote:
Thoughts?
Can you elaborate a bit more at the high level of how this works? That is, I'm not sure what adding the dates would get you, exactly.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/thanks/issues/15#issuecomment-291179369, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwkvqckUSg-aZrJ42L6pk-EfUnjSBlPks5rsRC3gaJpZM4LiSko .
We need a way to say "what was the last commit saved for the latest non master release"
Ah, I see. Right. Hm.
Honestly, with stuff like https://github.com/rust-lang-nursery/thanks/pull/97, I'm starting to wonder if the right approach here is to just always dynamically calculate things. Originally I chose not to because I wasn't sure about heroku; however, crates.io is hosted there, and it seems to work? I'm not sure how that works though; maybe @carols10cents could help enlighten us here.
Sorry, I don't have much context here about what you'd like to know... what about heroku are you not sure? The performance of database queries? The performance of web requests? Are you on free or paid dynos, free or paid database? What are you currently caching and considering dynamically calculating? How often do you need to update the data and query the data?
Crates.io does have some values we've tried to cache, and have had problems invalidating the cache at the right time in the right ways. We're now dynamically calculating the max_version
of a crate and that seems to be performing okay.
Version downloads, however, are still being aggregated by a background worker script-- version downloads get updated in one table within a download request, then the background worker rolls up those counts to a downloads
field on the versions
table, on the crates
table, and finally on a total_downloads
field in the metadata
table. I'm a little nervous about getting rid of this script because of postgres locking, since download requests are what happen most often.
Is any of that helpful...?
@steveklabnik Do you mean calculating commits, contributions, etc. in memory from the Github API responses? I think it could be pretty feasible, especially if you just cache it all in Redis. Would probably make setting up the dev environment simpler as well
@carols10cents sorry, I should have written more, sigh.
The way that thanks currently works is basically by mirroring a git db as a postgres db. i was thinking maybe instead, keeping a local git repo on the dyno would work instead.
Given crates.io has to deal with the index, I thought maybe it had it locally, but that could be wrong...
Do you mean calculating commits, contributions, etc. in memory from the Github API responses?
or from a local git repo.
Aha, i see. Yes, crates.io does have a local checkout of the git repo on heroku -- this code sets it up. GIT_REPO_CHECKOUT is set to /app/tmp/index-checkout
on crates.io and staging, and I have it set to ./tmp/index-co
on my mirror that also runs on heroku, and both seem to be working fine.
As long as you don't expect that directory to be the same across dyno instances or through restarts (that code checks out the git repo on server start if it's not there), then it should be fine.
As long as you don't expect that directory to be the same across dyno instances or through restarts (that code checks out the git repo on server start if it's not there), then it should be fine.
We will never have more than one instance, and even then, can make it work regardless 👍
Thank you! ❤️
So, @boxtown , I think we can do it this way. Do you have any interest in helping out? This is a much larger PR than this ticket originally imagined...
Actually, @boxtown let me give some stuff a try first, and then we can decide what to do here 😄
👍
On Apr 5, 2017 3:18 PM, "Steve Klabnik" [email protected] wrote:
Actually, @boxtown https://github.com/boxtown let me give some stuff a try first, and then we can decide what to do here 😄
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/thanks/issues/15#issuecomment-291967583, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwkvmQZ_xo9a8OTr8GRznmXjnd67Lctks5rs-kbgaJpZM4LiSko .
@boxtown so yeah https://github.com/rust-lang-nursery/thanks/pull/99
basically, after this merges, this specific issue can be closed; https://github.com/rust-lang-nursery/thanks/issues/13 would always get all the commits, since we're doing it from git natively.
i'm not closing this yet because i haven't actually landed that PR.