rust-playground
rust-playground copied to clipboard
Consider using downloads in the last N days, instead of all time.
TL;DR: We use the top 100 crates based on all time downloads to avoid being arbiters of what crates are available. If you think that a "popular" crate should be present on the playground, consider enhancing crates.io to have an API that would prioritize crates downloaded in a more recent interval.
Rationale
See discussion started in #82 —
I'm hesitant to start cherry-picking crates for inclusion based on my own whims. If I were to do that, I'd certainly blacklist xml-rs in favor of my own XML parser 😈
Additionally, I think about how I'd explain to someone why a given crate is available or not. If it's "someone submitted a PR", then I fear we'd be compiling 1000s of crates every night. If it's "the maintainer picked it", then I fear the backlash directed towards me on public (and private?) forums.
And #113 —
We've also toyed around with the idea of having "sponsored crates", where someone contributes money towards server costs (which are paid out of our own pocket at the moment), but strangely no one seems excited to pay for it ;-) If you are, I'm sure we can get in contact.
All I'm saying is... If you made libsqlite3 available in the sandbox, we would find it super helpful for bug reports. :)
Will the "last N days" selection help libsqlite3? How would one check that?
Only somewhat related to this issue and the discussion in #82, but it seems kind of strange (if I'm understanding this correctly) that updating the top crates list could cause code that previously used to work to no longer work. i.e. the 100th crate in the list may drop to 101, the service is redeployed, and now any code snippets that relied on that crate no longer work. That being said, I don't know what the best solution would be.
100th crate in the list may drop to 101
Even easier, a crate can simply release a new backwards-incompatible version. Since we always use the more-or-less-latest version, this is quite possible.
any code snippets that relied on that crate no longer work
This is a good point, but I don't see any good solution unless some benefactor decides to provide financial contribution ;-).
The Big Solution I can think of is to allow arbitrary versions of crates (and we might was well throw in arbitrary Rust versions at that point). Then we are one step away from being a (poor facsimile) of a web-based IDE.
no longer work
Beneficially, any code is still available, even if it's not executable.
code is still available
Which is part of the reason that I avoid adding the shortlink button and prefer the Gist button — I'm betting that GitHub and Gists will last a long time!
Looks like the last 90 days right now would be
id | name | downloads
------+-------------------------+-----------
463 | serde | 1387484
795 | libc | 880648
793 | bitflags | 713088
363 | lazy_static | 541543
524 | rustc-serialize | 536186
429 | winapi | 519486
547 | log | 500864
2164 | winapi-build | 487439
1339 | rand | 487066
811 | kernel32-sys | 486698
35 | gcc | 451679
4746 | num-traits | 430125
4697 | thread_local | 427619
2233 | regex-syntax | 423146
4446 | thread-id | 413968
2365 | aho-corasick | 412091
2364 | memchr | 403324
1592 | num_cpus | 400180
545 | regex | 399375
3231 | utf8-ranges | 391706
90 | time | 368111
7 | semver | 352012
2782 | serde_json | 348980
109 | url | 331922
11 | toml | 312885
5486 | itoa | 310237
5512 | dtoa | 307166
1368 | strsim | 306820
456 | matches | 295491
1873 | unicode-normalization | 293407
1306 | env_logger | 291379
1964 | unicode-xid | 288507
34 | pkg-config | 284810
1343 | byteorder | 276103
2361 | unicode-bidi | 272374
4747 | num-integer | 270981
6274 | syn | 268109
4577 | idna | 267905
4749 | num-iter | 259747
1369 | void | 259128
1498 | clap | 251366
6224 | quote | 248422
2906 | rustc_version | 246079
9 | num | 243416
1872 | unicode-segmentation | 241820
5544 | serde_codegen_internals | 227500
399 | ansi_term | 227338
2016 | vec_map | 226511
5535 | term_size | 216202
2556 | cfg-if | 216098
1761 | traitobject | 213957
1869 | unicode-width | 212723
2291 | unreachable | 209161
6169 | serde_derive | 208523
41 | openssl-sys | 196498
327 | hyper | 194419
1432 | httparse | 186600
121 | mime | 185423
657 | term | 179560
749 | unicase | 173259
836 | language-tags | 171383
13 | typeable | 165404
120 | chrono | 163677
8587 | synom | 160192
1447 | tempdir | 156973
231 | openssl | 151621
546 | getopts | 148962
835 | syntex_syntax | 141851
2725 | cmake | 134404
4751 | num-rational | 130499
10 | uuid | 128982
5571 | syntex_pos | 127034
5572 | syntex_errors | 126549
3019 | quick-error | 116168
2418 | net2 | 115828
4748 | num-complex | 115238
4750 | num-bigint | 114431
39 | libz-sys | 113267
2934 | crossbeam | 112450
2028 | filetime | 110147
2392 | slab | 110006
37 | flate2 | 109621
314 | itertools | 108840
8 | glob | 107591
56 | nix | 106651
62 | mio | 105288
7118 | redox_syscall | 104466
36 | miniz-sys | 103141
2303 | pulldown-cmark | 103108
174 | docopt | 102082
326 | cookie | 100972
3608 | rayon | 99444
534 | deque | 97841
4891 | error-chain | 95444
3119 | walkdir | 95152
2572 | atty | 93841
5045 | semver-parser | 86274
538 | curl-sys | 84881
2349 | backtrace-sys | 84820
5060 | rustc-demangle | 83222
(100 rows)
I'm a little nervous about using "last N days" as the factor to decide which crates to make available, because it will favour recently-updated crates. Of course, incentivizing actively-developed libraries is good, but I'm worried that stable, rock-solid crates that are used frequently will drop off, which is :(
I'm assuming that the underlying goal is to make the "100 most popular/used libraries" available. However, a crate can be used by a ton of projects, but if it isn't updated, then each project using the dependency won't download it again until the next release. However, a frequently-updated project will more quickly rack up downloads, as devs will be re-downloading the package to get the new version.
For example, I consider itertools to be useful, useful-for-many-projects dependency (even if I'm bad at using it :wink:), but due to its slower release cycle, it could easily fall off the crates.io list.
(At the same time, perhaps incentivizing frequently-updated crates will encourage new libraries to be developed and increase competition, but I'm not sure the Rust playground should be the platform to push for that type of objective)
Maybe it makes sense allow the use of the intersection of "top crates of all time" and "top crates from the past 90 days"? That way you can cover both crates that are very popular, but aren't updated frequently, and allow people to experiment with and share code from the newest hottest crates. And there's going to be a lot of overlap, so you won't be hosting too too many crates.
Downloads will always be an imperfect metric that over-emphasizes crates which are likely to be dependencies of other crates, regardless of whether it's all-time or recent.
For example, Rocket is easily one of the most popular crates out there -- or at least one of the most talked about. Nearly a third of the front page posts on /r/rust mention it in the comments at least once. However, it is only ranked 511 on all time downloads, and 443 in recent downloads.
Or to put it from another perspective -- Diesel has a handful of dependencies (only one is actually mandatory, but in practice anybody relying on Diesel will have at least 5). Let's assume for the moment that all users of Diesel have the chrono feature enabled (which is actually mostly true). Since all users of Diesel will also be using chrono, no matter how many people are using Diesel, it will never be able to beat chrono on the download rankings (even if some people are only using chrono because of Diesel). So at best it can rank #2. However, it doesn't stop there. Chrono has 3 dependencies. It will never rank higher than any of those three crates. Of those 3 second level dependencies, there are a total of 6 third level dependencies. I stopped counting there, but Cargo.lock says it's 18 crates total for the one crate.
My point being, if a crate simply chooses to allow integration with chrono (or even just uses it internally for something), that crate will now at absolute best be ranked 19th in downloads, regardless of whether it's ranked by recent or all time.
I was checking to see if nom was supported. It looks like it has enough downloads to be in that top 100 list above, but I guess that list above is out-of-date now?
It also seems like a pretty well-known crate for it to be excluded.
I'd wonder if there's a way of working categories into this - maybe the top 10 crates in each category should be included - although then it raises the issue of what the categories are, and who defines them. :) That way you get an even spread instead of favoring.
Another metric I could see factoring into this that would be harder to track - you could track how often somebody tries to include a crate and fails, or how often a crate actually gets included. This would then over time start to favor stuff that people often try to use the playground for. You couldn't directly compare failures and successes (once a crate started to work, it would likely start getting used more often) but maybe there's a multiplier that could be used to approximate a ranking of all currentlyincluded/currently-excluded crates.
Downloads will always be an imperfect metric that over-emphasizes crates which are likely to be dependencies of other crates, regardless of whether it's all-time or recent.
Hmm... If we assume that every time a crate is downloaded, all of its recursive dependencies are downloaded with it (including optional dependencies?), could there be a metric where they are subtracted to avoid double-counting?
@sollyucko I'm not weighing in on if that would be better or worse, but it's not as straight-forward as stated due to changing dependencies over time. For example, if crate A v1 depends on B v1 then A v2 drops the dependency, you need to account for that.
assume that every time a crate is downloaded, all of its recursive dependencies are downloaded
It's unclear if it matters, but this assumption isn't true. If crates A and B depend on Z, then I install A I'll download (A, Z), then when I install B I'll only download (B). Removing the counts of both A and B from Z would lead to double subtracting.
I have very little background here but ... I recently ran into an issue where all the examples in these docs for wasm-bindgen no longer run. If I understand correctly, the issue is that at one point wasm-bindgen was in this list and was then automatically removed when it fell off the list. That process in general seems undesirable as then any page using the playground could break at any time it updates.
If an algorithm for keeping things on the list forever is unacceptable, maybe it would be better if the rust-playground didn't include any crates period but the docs were updated on how to add a list of crates for a specific deployment. Then, each user of the playground could include the crates they need with their build (maybe this is already documented? This is my first brush with rust-playground)
update
I forgot that for most people they're relying on this one global server setup which only has one set of crates. That complicates the issue. Still, currently you write some code that requires a crate and tomorrow that code no longer runs because the crate fell off the list. That seems like a problem.
I'm pretty sure this is the first time I've heard the argument that since sometimes old code won't work, we should remove all crates for everyone, forever. Feels a lot like Solomon cutting the baby in half.
There's nothing stopping people from running their own deployments of the playground, but it's certainly not the case we've optimized for.
However, your point doesn't really have much to do with this issue; the issue is changing the algorithm your suggestion is to completely abandon the algorithm. It'd be better to discuss it in a separate issue.
My point was that the current algorithm is guaranteed to break stuff. People post on the playground, reference it, then others go to look, sample no longer runs. That's super confusing and not good for rust's reputation relative to other languages' playgrounds.
Either a redistributable that people can self-host or a warning seem like better solutions.
- Steven
On Thu, Jan 4, 2024 at 11:54 Greggman @.***> wrote:
My point was that the current algorithm is guaranteed to break stuff. People post on the playground, reference it, then others go to look, sample no longer runs. That's super confusing and not good for rust's reputation relative to other languages' playgrounds.
— Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust-playground/issues/101#issuecomment-1877677878, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVRX5BNHDOS2QZ5WSCT7GDYM4CH5AVCNFSM4C7X6YL2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBXG43DONZYG44A . You are receiving this because you commented.Message ID: @.***>