bevy-website
bevy-website copied to clipboard
Generate extra metadata from external sources
Objective
The new asset cards (#394) need more metadata for license and bevy compatible versions.
Solution
- For assets with a link to github or gitlab:
- Get the Cargo.toml file using the apis, parse it and read the bevy version and license fields.
- This is done on a best effort basis. It's not always possible to have this information. For example, when the Cargo.toml is not at the top level.
- For assets with crates.io links
- Use the crates.io dbdump to get the required data.
- If the metadata is entered manually directly in bevy_assets it will be used instead.
The github client relies on a GITHUB_TOKEN env var to exist in CI or locally. It should be already there since we use github actions. The rate limit in github actions is 1000 per hour https://docs.github.com/en/rest/overview/resources-in-the-rest-api#requests-from-github-actions so we should be fine for a little while.
The gitlab client implementation ignores the token because we have so few crates that it doesn't matter.
Notes
Querying the crates.io dump is pretty slow and the deps-tree required is pretty big. I think we should consider eventually rolling our own way to do that, but it works well right now.
This is going to be pretty slow to execute in the job once it is merged, but it's pretty easy to parallelize all the generate-* jobs so I did it in #401 .
I added support for querying the crates db, but I'm not sure how it will work in CI. Right now it downloads it whenever we generate. I made sure to not run it on valide though.
Also, surprisingly, this is way slower than the network call to github. I would have assumed, since it's a local db it would be pretty quick. I guess there's just too many crates.
I implemented ci caching for crates db, so it only fetches the zip after 24 hours, which fits the timestamp for when the zip file gets updated by new data: https://github.com/BlackPhlox/bevy-website/blob/8bbe21286035b60547e6ba630241896c8fb1741f/.github/workflows/ci.yml#L16-L24
Also, surprisingly, this is way slower than the network call to github. I would have assumed, since it's a local db it would be pretty quick. I guess there's just too many crates.
The current bottleneck that takes the longest for the local db is reversed dependencies lookup.
Ah, nice, I'll add the ci caching.
The current bottleneck that takes the longest for the local db is reversed dependencies lookup.
Oh right, completely forgot to update that call to use the same approach as you did which doesn't do a reverse lookup I think?
I think its properly a good idea to have a notice about dbdump size and instructions if users want to run the generate-assets locally, something like https://github.com/IceSentry/bevy-website/pull/3
The caching should also have to be added to .github/workflows/deploy.yml, though I don't know if they are going to share the cache, ideally, they should.
I tried not using rev_dependency and it wasn't noticeably faster 😢
That's because get_crate uses rev_dependency 😅 get_crate combines all query to get the full crate and its reverse dependencies as you get on crates.io. Also, I think my naming of rev_dependency is not very good, I should have just called it dependency_lookup or the like.
The GITHUB_TOKEN secret is supposed to be configured by default in any github actions job and for now I don't use any gitlab token so I don't think it needs any manual step.
no manual steps, just approval as @cart expressed his preference to be aware of token use
Oh, makes sense, it's definitely going to use that token a lot more now.
Also, small note on that, github has a way to query multiple repos with a single call. It would require a big re-architecture to process multiple crates at the same time, but if we ever reach a rate limit it will be possible to work around that.
bors r+
Pull request successfully merged into master.
Build succeeded: