bevy-website Generate extra metadata from external sources

trafficstars

Objective

The new asset cards (#394) need more metadata for license and bevy compatible versions.

Solution

For assets with a link to github or gitlab:
- Get the Cargo.toml file using the apis, parse it and read the bevy version and license fields.
- This is done on a best effort basis. It's not always possible to have this information. For example, when the Cargo.toml is not at the top level.
For assets with crates.io links
- Use the crates.io dbdump to get the required data.
If the metadata is entered manually directly in bevy_assets it will be used instead.

The github client relies on a GITHUB_TOKEN env var to exist in CI or locally. It should be already there since we use github actions. The rate limit in github actions is 1000 per hour https://docs.github.com/en/rest/overview/resources-in-the-rest-api#requests-from-github-actions so we should be fine for a little while.

The gitlab client implementation ignores the token because we have so few crates that it doesn't matter.

Notes

Querying the crates.io dump is pretty slow and the deps-tree required is pretty big. I think we should consider eventually rolling our own way to do that, but it works well right now.

This is going to be pretty slow to execute in the job once it is merged, but it's pretty easy to parallelize all the generate-* jobs so I did it in #401 .

Jul 19 '22 00:07 IceSentry

I added support for querying the crates db, but I'm not sure how it will work in CI. Right now it downloads it whenever we generate. I made sure to not run it on valide though.

Also, surprisingly, this is way slower than the network call to github. I would have assumed, since it's a local db it would be pretty quick. I guess there's just too many crates.

Jul 19 '22 01:07 IceSentry

I implemented ci caching for crates db, so it only fetches the zip after 24 hours, which fits the timestamp for when the zip file gets updated by new data: https://github.com/BlackPhlox/bevy-website/blob/8bbe21286035b60547e6ba630241896c8fb1741f/.github/workflows/ci.yml#L16-L24

Jul 19 '22 17:07 BlackPhlox

Also, surprisingly, this is way slower than the network call to github. I would have assumed, since it's a local db it would be pretty quick. I guess there's just too many crates.

The current bottleneck that takes the longest for the local db is reversed dependencies lookup.

Jul 19 '22 17:07 BlackPhlox

Ah, nice, I'll add the ci caching.

The current bottleneck that takes the longest for the local db is reversed dependencies lookup.

Oh right, completely forgot to update that call to use the same approach as you did which doesn't do a reverse lookup I think?

Jul 19 '22 17:07 IceSentry

I think its properly a good idea to have a notice about dbdump size and instructions if users want to run the generate-assets locally, something like https://github.com/IceSentry/bevy-website/pull/3

Jul 19 '22 17:07 BlackPhlox

The caching should also have to be added to .github/workflows/deploy.yml, though I don't know if they are going to share the cache, ideally, they should.

Jul 19 '22 17:07 BlackPhlox

I tried not using rev_dependency and it wasn't noticeably faster 😢

Jul 19 '22 17:07 IceSentry

That's because get_crate uses rev_dependency 😅 get_crate combines all query to get the full crate and its reverse dependencies as you get on crates.io. Also, I think my naming of rev_dependency is not very good, I should have just called it dependency_lookup or the like.

Jul 19 '22 18:07 BlackPhlox

The GITHUB_TOKEN secret is supposed to be configured by default in any github actions job and for now I don't use any gitlab token so I don't think it needs any manual step.

Aug 01 '22 16:08 IceSentry

no manual steps, just approval as @cart expressed his preference to be aware of token use

Aug 01 '22 16:08 mockersf

Oh, makes sense, it's definitely going to use that token a lot more now.

Aug 01 '22 16:08 IceSentry

Also, small note on that, github has a way to query multiple repos with a single call. It would require a big re-architecture to process multiple crates at the same time, but if we ever reach a rate limit it will be possible to work around that.

Aug 01 '22 16:08 IceSentry

bors r+

Aug 17 '22 22:08 cart

Pull request successfully merged into master.

Build succeeded:

Aug 17 '22 22:08 bors[bot]

bevy-website bevy-website copied to clipboard

Generate extra metadata from external sources

Objective

Solution

Notes

bevy-website
bevy-website copied to clipboard