metacpan-api icon indicating copy to clipboard operation
metacpan-api copied to clipboard

Add recommendation support

Open yanick opened this issue 12 years ago • 50 comments

Issue placeholder for discussions related to http://babyl.dyndns.org/techblog/entry/metacpan-recommendations.

yanick avatar Mar 10 '13 22:03 yanick

I think there's a lot more value to recommendations if people can justify the recommendation.

To avoid it becoming yet another reviews site though, perhaps instead of putting a note by recommendations, allow people to tag their recommendation with pre-defined categories, such as: "more features", "better API", "tiny", "in core", etc.

For example, on the LWP::UserAgent page I might want to recommend both WWW::Mechanize ("more features") and also HTTP::Tiny ("in core", "tiny").

tobyink avatar Mar 10 '13 22:03 tobyink

Also, a category which would need to be displayed separately: recommend modules which aren't an alternative to the current module, but are good partners for it. For example on the List::Util page, recommend List::MoreUtils.

tobyink avatar Mar 10 '13 22:03 tobyink

Let's keep it simple for the first release. We can still over-engineer later on :)

monken avatar Mar 10 '13 22:03 monken

@yanick here is my take on how to implement this: I like the idea of bundling the recommendation with a ++. In fact, we could extend the /favorite endpoint to include the inferior modules in the favorite document.

Example: user A recommends Moo over Moose and Mo, the resulting favorite entry would be:

{
  user: 1,
  distribution: "Moo",
  instead_of: ["Moose", "Mo"]
}

Pretty easy to implement and easy to query, too. Another thing to keep in mind. In my opinion we should use distribution instead of module names. When people look at Moose::Role, they still want to see the recommendation for Moo, although the user didn't explicitly recommend Moo::Role over Moose::Role.

monken avatar Mar 10 '13 22:03 monken

My vote is for simple to start with as well. I think we should also take into account the ideas presented by @timbunce here http://blog.timbunce.org/2013/03/10/suggested-alternatives-as-a-metacpan-feature/. I would say for this thread we could limit discussion of Tim's blog post to the points where it overlaps with what @yanick has proposed. We really need a road map moving forward here. My inclination would be to break it up into small, simple, deployable chunks with a view to expanding the functionality down the road if it turns out to be as useful as we think/hope it will be.

I like the UI which Tim has proposed for the recommendations and I tend to agree that modules make more sense than distributions for the suggested alternatives. However, having ++ refer to dists and alternatives referring to modules could lead to some confusion. At the very least, I think it's a conversation worth having and maybe getting some wider input on.

Seeing the way @mo has laid out the ++ entry, looks really clean and easy to use, but what if I want to recommend Mojo::UserAgent as an alternative for LWP::UserAgent? Now we're talking about

{
  user: 1,
  distribution: "Mojolicious",
  instead_of: ["libwww-perl"]
}

oalders avatar Mar 11 '13 00:03 oalders

[incorporate Tim's ideas here as well] Absolutely.

[road map] I agree that many small steps is the way to go. The sooner we have a core to play with, the sooner we can have a snowball effect and hundred, nay, thousand, of hackers pouring over the feature and submitting patches. ... okay, I might be harbouring too much hope here, but you know what I mean. ;-)

[modules versus distributions] I see the point for modules. For most modules/distributions, it won't make a lot of difference as, typically, one dist == one functionality == one main module. What I see playing again a per-module recommendation is that there will be dilution/confusion: anything that recommend Moose::Util, Moose::Meta::Class, Moose::Role really boils down to recommending Moose. Now, it's true that the flip side is that for distributions that are an umbrella for many functionalities (@oalders's example of Mojolicious is a good one), the recommendation might look odd, but I think that's still better than having a more diffuse module selection.

yanick avatar Mar 11 '13 00:03 yanick

[road map] I agree that many small steps is the way to go, but picking the right direction is important! :)

[modules versus distributions] Firstly, I would argue that the current placement of the ++ is misleading. Rather than:

Gisle Aas / libwww-perl-6.04 / LWP::UserAgent [47 ++]

I'd suggest:

Gisle Aas / libwww-perl-6.04 [47 ++] – LWP::UserAgent

To take your example @yanick, it's unlikely that anyone would recommend Moose::Util, Moose::Meta::Class, or Moose::Role specifically unless they had good reason. They'd simply refer to Moose instead. If they did have a good reason then they wouldn't be able to express it clearly if they had to do so at the distro level.

Also, consider the case of a large distribution with many modules (Moose, DBIx::Class etc) where someone has developed a separate distribution that contains a single module that's improves on the functionality of just one of the bundled modules. Clearly that distro isn't a "suggested alternative" for the original distro, but the module is a "suggested alternative" for a specific module in that distro.

(I can see an argument for calling the new distro a "complementary distro". So if you choose to implement at the distro level then the implementation should support different relationship types from the start.)

[API] I'm nervous of having this functionality ride piggy-back on favourites, but I don't know the API well enough to know how valid that concern is. Clearly it's only appropriate if you choose to implement this at the distro level.

They'll need to be API support for the other side of the relationship as well, i.e. the "Suggested as the alternative to X other modules by Y people" and "Suggested as complementary with X other modules by Y people".

[Naming] Either "suggested alternative" and "suggested addition" or "recommended alternative" and "recommended addition". Umm, "alternative" seems clear but "addition" doesn't seem quite right; "extra" is a bit vague and "complementary" is a bit of a mouthful. I'll let you bike-shed that one :)

timbunce avatar Mar 11 '13 11:03 timbunce

[placement of ++] @timbunce Please open a separate ticket for that. :+1:

[module vs distro] libwww-perl and Mojo are the exception and don't really follow the idea of CPAN where each dist tackles a certain problem or functionality. If we do it on a dist level that will also motivate people to split up their large dists. One might argue that the perl dist has many modules that are candidates for recommendations. My argument against that is that most of these modules are dual-lived and have their own dist that we would recommend instead of the perl dist. Again, let's keep it simple and I feel like having the recommendation on a module level would cause us a lot of headache.

Worst case scenario: User looks at LWP::UserAgent and recommends Mojo::UserAgent, will result in

{
  user: 1,
  distribution: "Mojolicious",
  instead_of: ["libwww-perl"]
}

Looking at any of the libwww-perl modules will result in showing a recommendation for Mojolicious (instead of Mojo::UserAgent). I'm totally fine with that. But others might disagree.

[API] Both queries you suggested should be supported. I like putting it in the favorite table because it relates and makes the implementation easier (in my mind).

[Naming] I vote for recommended alternative, it's quite a mouthful for the API key so I still vouch for instead_of

monken avatar Mar 11 '13 14:03 monken

Please, please, please, use module names. They are stable and reliable. Dist names are not. Dist names aren't even unique unless paired with uploader name (AUTHOR/Foo-Bar-1.23.tar.gz), so how do you track recommendations as different maintainers release. You're going to be in a heuristic pickle. (You might be there already with "previous dist".)

Modules are also precise. Forget the Mojolicious example, how about Scalar::Util and List::Util? Different alternatives will apply to each.

You can always roll it up on a distribution page and show other suggested distributions based on modules contained.

dagolden avatar Mar 12 '13 00:03 dagolden

[naming]

I suggest using see also -- this is softer than "recommended alternative" and could eventually be enhanced with comments or tagging. It would allow for "recommendation" or "alternative" or "for use with" semantics.

It solves the discoverability problem without making direct value judgments. And it's general enough that you can use it to weight search rankings.

dagolden avatar Mar 12 '13 01:03 dagolden

[naming]

see also is, imho, too soft. The razor edge's we are walking, I think, is to have something that won't degenerate in bloodbaths, but still provide a venue to recommend solution X over solution Y.

[modules versus distributions]

Question for MetaCPAN peeps: assuming that we go with module names, is it easy or costly to have aggregation of those results done for the distribution? I'm asking because I think that we have to show some form of results at the dist level (I do not want to click through the many modules of DBIx::Class to know what peeps recommend for DBIx::Class as a whole). If per-dist aggregations of the module results is costly, then that would be a strong argument against the per-module recommendations. If not... then the fight can go on. :-)

yanick avatar Mar 12 '13 01:03 yanick

[naming]

I had also thought "see also" would be a nice, succinct way of naming this. The argument I had against it is that "See Also" seems to be a fairly common header in module documentation and the meaning given there is probably wider in scope. It seems to be, "if you like this, then you might just want to look at these", with no implied judgement. However, I could live with our version of "See Also" having a narrower definition.

oalders avatar Mar 12 '13 02:03 oalders

[naming]

I think it's good to start soft and general, because you can always make it harder with more specific annotations or tagging. On the other hand, if you try to create the right ontology first and get it wrong, then you're sort of locked in.

Go back to @tobyink 's comment about comments -- I think starting with soft will allow greater insight into how it's being used, then more rigor can be figured out on the basis of actual usage.

dagolden avatar Mar 12 '13 04:03 dagolden

All @dagolden's points are strong ones and I agree with them.

Re @yanick's query on cost of calculating the distro level recommendations, that could be done async as a batch job. As mentioned before, most recommendations will be made to the 'root' module of a distro, and the module level recommendations on that page would be updated immediately. I don't see a problem with the distro page not getting the change till later.

timbunce avatar Mar 12 '13 09:03 timbunce

@dagolden when I talk about distributions I'm talking about Foo-Bar and not AUTHOR/Foo-Bar-1.23.tar.gz, which is a release in my mind. So the issue to track releases of different authors doesn't really apply.

Someone might recommend List::MoreUtils over List::Util. There is no harm if that also shows up for Scalar::Util. It's one distribution, it's one bucket of modules that try to solve a common issue: provide utilities to Perl data structures.

@timbunce

most recommendations will be made to the 'root' module of a distro

I agree and that's why we could just stick with recommending distributions because in all those cases, the distribution matches the module name. I guess I still have trouble understanding why we should recommend based on module names when we collapse on a dist-level anyway.

monken avatar Mar 12 '13 13:03 monken

@monken "distributions" in the way you describe it don't exist as far as PAUSE is concerned. They don't exist as far as users are concerned because they can't be installed. They are a fiction invented by search.cpan.org and mimicked by metacpan.org. Perpetuating that design mistake would be unfortunate.

dagolden avatar Mar 12 '13 13:03 dagolden

"recommendations" is wrong. For a start, now what do you call the relationshuip going the other way?

SEE ALSO is exactly the CPAN tradition for this since it doesn't mean "if you like this module", it means "if you're looking at this module you should also look at".

Generally in a deprecation situation, optimally you'd create the relationship going both ways.

Plus many recommendations would be conditional on some factor - there's no one universal best practice or we wouldn't be having this discussion, we'd just be picking one best module for each task and moving on.

As an example - I might add a link on DBIx::Class saying "see also DBIx::Lite if you don't need objects" and they might add a link back for the 'do need objects' case. Those are both conditional recommendations.

We can't assume a concept of 'obsoletes' and 'obsoleted by' - the Mojo/LWP example is good there, since while the Mojo API is a lot nicer for a lot of cases sri told me he doesn't want it to become 'the' HTTP API because he doesn't want to take on the backcompat requirements, so it's not a straight replacement even at the user agent level.

Another example would be PAR and App::FatPacker. fatpacker is way nicer to deal with than PAR for the cases it supports ... because by refusing to handle XS I managed to avoid 90% of the complications. So I'd like to think that it obsoletes PAR for most pure perl packing, but I still recommend PAR when you need XS support.

So I think calling it 'module relationships', displaying it as 'see also', and letting people put 'obsoletes' or 'obsoleted by' in the tags is probably the sensible way forwards. We can't capture a lot of the useful information otherwise, and it leaves us providing mechanism and then seeing what the userts shake out in tertms of policy

shadowcat-mst avatar Mar 12 '13 13:03 shadowcat-mst

You rate distributions, you file bugs against distributions, cpan testers is organized by distributions. I think that term and fiction is well established in the Perl community and ecosystem. I understand that PAUSE follows a different approach, but I don't think it's practical to think in terms of modules for many use cases.

monken avatar Mar 12 '13 14:03 monken

@monken CPAN works the way dagolden describes, not the way you describe. rt.cpan.org creates a queue based on the name of the first tarball to contain a new module, then uses that module's permissions to determine the maintainer, and the result is that bugs have to be re-opened when modules are split out. Not actually a feature, just a historical thing.

A see also attached to Sub::Quote pointing to Eval::Closure should not stay with Moo if I split the module out. A see also on Path::Router pointing to Web::Dispatch should not stay pointing at Web::Simple if I split the module out.

Making links to mojolicious would be completely futile if it was dist level only, too.

So for this use case it evidently isn't practical to think in terms of distributions alone. So the remaining question is whether we initially support only modules, or whether we need distributions as well. Can you provide a concrete example of a case where distributions work and modules don't?

shadowcat-mst avatar Mar 12 '13 14:03 shadowcat-mst

I'm actually curious what happens on RT if there's an identically named distribution. If I were more evil, I would upload Moose-3.000.tar.gz containing a legal, unindexed module (NotReallyMoose.pm) and see what blows up as a result.

Since Metabase started, internally, all reports are full AUTHOR/DIST-VERSION.SUFFIX. It's only the display stuff that hasn't been updated.

Regardless of that, I think @shadowcat-mst makes the stronger case -- as modules move between distribution, recommendations/see-also should follow them.

I can't make you rewind the clock and stop having metacpan.org stop using "distribution" the way you are. But I do encourage you not to hang any more stuff off a non-unique key.

dagolden avatar Mar 12 '13 15:03 dagolden

[naming]

How about Related Modules? Not the "See Also" we're used to seeing; establishes that there is a relationship; but is generic to support future distinctions.

dagolden avatar Mar 12 '13 16:03 dagolden

[naming]

I like "Related Modules", but I also do think "See Also" is the most succinct, even if I have some reservations about it.

[modules versus distributions]

I feel like at this point we've settled on modules and can move on from this. We an always add a dist recommendation system if needed, but I can't see a use case for that just yet. Correct me if I'm wrong.

oalders avatar Mar 13 '13 20:03 oalders

As per http://blogs.perl.org/users/neilb/2013/03/whats-wrong-with-cpan.html#comment-405091 it would be nice if an author's own recommendations could be handled specially.

OK, so authors already get to put whatever recommendations they like in the pod, but the recommendation system would be machine-queryable.

tobyink avatar Mar 13 '13 20:03 tobyink

I think there are some overlapping concepts that are possibly getting mixed up here, including at least:

  • identifying all modules in a group. Eg all modules related to defining constants. It's a set, not an ordered list, but obviously might be presented in some order based on metadata, including the next point. I think of SEE ALSO as this list, though as someone above pointed out, some SEE ALSOs suggest specific (other) modules to use in certain situations.
  • 'suggested alternate' module(s). Eg "if you're using Readonly I think you should look at Const::Fast (or ...) instead".

I think the first type could be solved by tagging: Const::Fast might be tagged with "constants", and "immutable variables". These could be displayed next to every module (that has them), and clicking on them would list all modules in that group (ie tagged with that tag). So if someone knows they're thinking about immutable variables, they'll click on that, and get a shorter list.

I think the two concepts can be tied together by making the alternate modules model be "I think module::A is better/worse/equivalent than/to module::B [for tag]".

neilb avatar Mar 14 '13 00:03 neilb

Just a quick word to say that I'm chugging along with the UI part at https://github.com/yanick/metacpan-web/tree/recommendations I have stubbed in a MetaCPAN::Web::Controller::Account::Recommend, and I should be in a position to hook to the ElasticSearch backend as soon as I have one hour or two more to sink in this project. So... if any of you metacpaners feel like carving me a rest uri for that, that might come in handy rrrrreal soon. :-)

yanick avatar Mar 27 '13 02:03 yanick

Because I'm obviously bonker, I began to look at the cpan-api side of things. Result: https://github.com/yanick/cpan-api/tree/recommendations I have absolutely no idea what I'm doing... but I have tests that are passing for the creation and removal of recommendations.

Anyway, that part is far from being done, but I just wanted to give everybody a fair warning that the fox is in the henhouse. It's not too late to grab a shovel and come give it a good whack before it does too much damage. :-)

yanick avatar Mar 30 '13 00:03 yanick

@yanick I don't know the metacpan code enough to do a review - just wanted to say keep it up even if you are damaging the henhouse - I'm sure someone will help patch it up after :)

ranguard avatar Mar 30 '13 19:03 ranguard

@ranguard That's the plan. :-) If nothing else, it gives me an excuse to learn ElasticSearch, which I wanted to do for some time now.

I'll push my latest code in a few instants. But it seems that I can push changes to the db just fine (yay!). Now remains the more thorny question of how ES does its searching.

yanick avatar Mar 30 '13 20:03 yanick

I... I think I have a working prototype. https://github.com/yanick/metacpan-web/tree/recommendations and https://github.com/yanick/cpan-api/tree/recommendations

In the database, I have Recommendation documents that have a user / module / alternative triplet, which can be pushed via

/recommendation/[user]/[module]/alternative/[better module]

In metacpan-web, the lesser and better alternatives to the current module are gathered in, respectively, 'instead_of' and 'supplanted_by'.

And that's pretty much it. Oh, and I put in the restriction that a user can only give one alternative for each module (to keep things simple).

yanick avatar Mar 31 '13 01:03 yanick

I think that I went as far as I could go. For the next step, I'd need somebody from the MetaCPAN team to look at what I did, and provide feedback for the, uh, well-meant atrocities I did to the model. Not to mention that I also need feedback on the UI: placement / nomenclature / etc.

yanick avatar Apr 08 '13 13:04 yanick