thanks icon indicating copy to clipboard operation
thanks copied to clipboard

update-commit-db _might_ miss commits

Open steveklabnik opened this issue 7 years ago • 15 comments

It only looks at the first page of commits. If we get very busy, it might not catch all of them.

This should be alleviated once a release happens, but master will be missing in the meantime. So ideally it should go back enough to not miss anything.

steveklabnik avatar Jan 12 '17 22:01 steveklabnik

I'll pick this up. Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database. Quick question though, what is the purpose of the visible field in the Releases table?

boxtown avatar Apr 02 '17 17:04 boxtown

Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database

Exactly.

Quick question though, what is the purpose of the visible field in the Releases table?

In order to write the release blog post, I need to know how many people have contributed. In order to do that, I have to create the release. But then it's up before the actual announcement. So, basically, a new release will sit as false for a few days, and then set to true on release day.

Given that this process is only done with the master release, which will always be true, you shouldn't have to worry about it :)

steveklabnik avatar Apr 03 '17 14:04 steveklabnik

Having hacked on it a little, there's two ways I can see implementing this. The first assumes that commits are inserted in order which is a hacky assumption BUT requires very little change to the current code. The second would add a date column to the commits table in order to set a sort order for commits but is a much more involved change. Thoughts? Sorry for the terrible formatting

On Apr 3, 2017 10:21 AM, "Steve Klabnik" [email protected] wrote:

Idea is to loop through pages grabbing commits until you either run out of pages or hit the SHA of the latest commit in the latest release for a project currently in the database

Exactly.

Quick question though, what is the purpose of the visible field in the Releases table?

In order to write the release blog post, I need to know how many people have contributed. In order to do that, I have to create the release. But then it's up before the actual announcement. So, basically, a new release will sit as false for a few days, and then set to true on release day.

Given that this process is only done with the master release, which will always be true, you shouldn't have to worry about it :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/thanks/issues/15#issuecomment-291157954, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwkvoSsPdyahjv6R1HqHODi8R7SKIASks5rsQBngaJpZM4LiSko .

boxtown avatar Apr 03 '17 14:04 boxtown

Thoughts?

Can you elaborate a bit more at the high level of how this works? That is, I'm not sure what adding the dates would get you, exactly.

steveklabnik avatar Apr 03 '17 15:04 steveklabnik

We need a way to say "what was the last commit saved for the latest non master release". We could assume that commits are inserted in order which means we wouldn't need dates, or we could use the date_approved field. Unless there is some other mechanism for ordering commits that I'm not aware of

On Apr 3, 2017 11:31 AM, "Steve Klabnik" [email protected] wrote:

Thoughts?

Can you elaborate a bit more at the high level of how this works? That is, I'm not sure what adding the dates would get you, exactly.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/thanks/issues/15#issuecomment-291179369, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwkvqckUSg-aZrJ42L6pk-EfUnjSBlPks5rsRC3gaJpZM4LiSko .

boxtown avatar Apr 03 '17 16:04 boxtown

We need a way to say "what was the last commit saved for the latest non master release"

Ah, I see. Right. Hm.

Honestly, with stuff like https://github.com/rust-lang-nursery/thanks/pull/97, I'm starting to wonder if the right approach here is to just always dynamically calculate things. Originally I chose not to because I wasn't sure about heroku; however, crates.io is hosted there, and it seems to work? I'm not sure how that works though; maybe @carols10cents could help enlighten us here.

steveklabnik avatar Apr 03 '17 16:04 steveklabnik

Sorry, I don't have much context here about what you'd like to know... what about heroku are you not sure? The performance of database queries? The performance of web requests? Are you on free or paid dynos, free or paid database? What are you currently caching and considering dynamically calculating? How often do you need to update the data and query the data?

Crates.io does have some values we've tried to cache, and have had problems invalidating the cache at the right time in the right ways. We're now dynamically calculating the max_version of a crate and that seems to be performing okay.

Version downloads, however, are still being aggregated by a background worker script-- version downloads get updated in one table within a download request, then the background worker rolls up those counts to a downloads field on the versions table, on the crates table, and finally on a total_downloads field in the metadata table. I'm a little nervous about getting rid of this script because of postgres locking, since download requests are what happen most often.

Is any of that helpful...?

carols10cents avatar Apr 03 '17 23:04 carols10cents

@steveklabnik Do you mean calculating commits, contributions, etc. in memory from the Github API responses? I think it could be pretty feasible, especially if you just cache it all in Redis. Would probably make setting up the dev environment simpler as well

boxtown avatar Apr 04 '17 20:04 boxtown

@carols10cents sorry, I should have written more, sigh.

The way that thanks currently works is basically by mirroring a git db as a postgres db. i was thinking maybe instead, keeping a local git repo on the dyno would work instead.

Given crates.io has to deal with the index, I thought maybe it had it locally, but that could be wrong...

Do you mean calculating commits, contributions, etc. in memory from the Github API responses?

or from a local git repo.

steveklabnik avatar Apr 04 '17 21:04 steveklabnik

Aha, i see. Yes, crates.io does have a local checkout of the git repo on heroku -- this code sets it up. GIT_REPO_CHECKOUT is set to /app/tmp/index-checkout on crates.io and staging, and I have it set to ./tmp/index-co on my mirror that also runs on heroku, and both seem to be working fine.

As long as you don't expect that directory to be the same across dyno instances or through restarts (that code checks out the git repo on server start if it's not there), then it should be fine.

carols10cents avatar Apr 05 '17 01:04 carols10cents

As long as you don't expect that directory to be the same across dyno instances or through restarts (that code checks out the git repo on server start if it's not there), then it should be fine.

We will never have more than one instance, and even then, can make it work regardless 👍

Thank you! ❤️

So, @boxtown , I think we can do it this way. Do you have any interest in helping out? This is a much larger PR than this ticket originally imagined...

steveklabnik avatar Apr 05 '17 14:04 steveklabnik

Actually, @boxtown let me give some stuff a try first, and then we can decide what to do here 😄

steveklabnik avatar Apr 05 '17 19:04 steveklabnik

👍

On Apr 5, 2017 3:18 PM, "Steve Klabnik" [email protected] wrote:

Actually, @boxtown https://github.com/boxtown let me give some stuff a try first, and then we can decide what to do here 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/thanks/issues/15#issuecomment-291967583, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwkvmQZ_xo9a8OTr8GRznmXjnd67Lctks5rs-kbgaJpZM4LiSko .

boxtown avatar Apr 05 '17 19:04 boxtown

@boxtown so yeah https://github.com/rust-lang-nursery/thanks/pull/99

basically, after this merges, this specific issue can be closed; https://github.com/rust-lang-nursery/thanks/issues/13 would always get all the commits, since we're doing it from git natively.

i'm not closing this yet because i haven't actually landed that PR.

steveklabnik avatar Apr 06 '17 17:04 steveklabnik