mlpack icon indicating copy to clipboard operation
mlpack copied to clipboard

Revamp Markdown bindings for overhauled website

Open rcurtin opened this issue 1 year ago • 4 comments
trafficstars

This is a "paired PR" along with #61.

In recent months, I have been writing Markdown documentation (like this) for each individual C++ class that mlpack provides. This is easy to compile into HTML, and there is even a script to do it, in scripts/build-docs.sh. However, this handwritten Markdown documentation did not integrate with the automatically-built Markdown documentation for our bindings.

I spent a few weeks trying to figure out how to put it all together, and I was able to do it, although it required a significant refactoring of how the Markdown bindings work. Here is a working version of the new mlpack website (also with changes from the PR referenced above):

https://www.ratml.org/misc/mlpack_new/

and, the documentation for Python is relatively unchanged, but now fits in with the themeing and formatting of the rest of the documentation:

https://www.ratml.org/misc/mlpack_new/doc/user/bindings/python.html

There are a number of important changes to the functionality here that I'll list below:

  • Instead of generating one gigantic mlpack.md file that contains documentation for all languages, the generate_markdown program now takes two parameters: the first specifies the language to print documentation for, and the second specifies whether to generate the documentation as a Markdown file, or the sidebar for the documentation as HTML. This means that all the crazy sed/awk crap that the website build script had to do to split mlpack.md into one file for each language is no longer needed (phew!).

  • The generated Markdown documentation is actually committed to the repository, in doc/user/bindings/. This allows someone to browse the documentation without needing to do make markdown---in fact, they can browse it directly on Github (although Github's Markdown reader doesn't render everything perfectly). It also allows website building to be done without actually compiling mlpack, which is significant time savings: building the website from scratch before would take several hours, because we had to build each version of the library, and then build the Markdown bindings, etc., etc.

  • To ensure that the Markdown bindings that are committed in the repository don't go out of date, I added scripts/check-markdown-docs.sh, and I'll set up a CI job on Jenkins that just runs it. The job will fail if the bindings have been updated in a way that the generated Markdown changes, but those changes were not committed to doc/user/bindings/.

  • build-docs.sh now supports custom sidebars for documentation pages. I wanted the bindings to display a different sidebar---if the user is browsing Python documentation, the sidebar should not be full of links to C++-specific documentation. Example Python page, example regular documentation page

  • All of the URLs in the SEE_ALSO()s for each binding are now checked by the build-docs.sh script. In fact, all links in the generated Markdown bindings are now ensured to be valid.

  • HISTORY.md is now also compiled into HISTORY.html by scripts/build-docs.sh, so that it can be referenced from the website. I had to make a handful of changes there so it would render reasonably.

I know the diff is large... any feedback is much appreciated. :smile:

rcurtin avatar May 02 '24 22:05 rcurtin

@rcurtin It's really nice to see this come together. A few comments and suggestions based on the current state at https://www.ratml.org/misc/mlpack_new/

Main landing page:

  • In the short description under the mlpack name in top center, suggest to omit "C++"; maybe change the description to "extensive library of flexible machine learning algorithms"; the idea is to avoid being pigeonholed into C++-only land right off the bat, as mlpack has bindings to other languages
  • Suggest to change the two blurbs under the logo (separated by a vertical line) into a simple bullet point list without the vertical divider
  • Suggest to add a third bullet point, explicitly stating that there are bindings to Python etc
  • For the citation section, suggest to use only one paper (the latest JOSS article from 2023), and remove the older ones; they're distracting and redundant
  • Suggest to remove or simplify the Benchmarks section; the table is too dense and has too many details; at the very least reduce the table down to 3 or 4 rows maximum; suggest to also remove the Shogun and WEKA columns -- these projects are pretty much dead and including them is just distracting
  • Suggest to change the "Download mlpack" button to link to the "Quickstart" page, instead of directly pointing to the .tar.gz archive; also remove mlpack version details, in order to reduce the maintenance burden; more on that below

Quickstart page:

  • Suggest to rename this section to "Installation" or "Download"
  • Suggest to have a separate section with "Latest Stable Release", linking to mlpack-4.3.0.tar.gz. This is instead of stating "mlpack is available in most package managers, or the source can be downloaded directly"
  • Suggest to add a note along these lines: "Pre-built mlpack packages are provided for various operating systems; these packages may not be the latest version; if you're encountering problems, use the official stable release provided here"

Documentation page:

  • Several sections appear empty, so assuming these are under construction: "Clustering algorithms", "Geometric algorithms", "Preprocessing utilities"

conradsnicta avatar May 03 '24 02:05 conradsnicta

Actually I will mark this as a draft for now so that I can handle all the changes, and then mark it as ready.

rcurtin avatar May 03 '24 12:05 rcurtin

I was traveling this past week, so it took me a while to get back to, but I much appreciate the comments @shrit and @conradsnicta. The example website has been updated with all the changes below.

I also made some significant changes to the sidebar---instead of having different sidebars for binding documentation and for C++ documentation, now the same sidebar is used for both. I think that it's a lot more consistent and easy to navigate now.

Some responses to each point and changes that I made. (Sorry that it got really long...)

In the short description under the mlpack name in top center, suggest to omit "C++"; maybe change the description to "extensive library of flexible machine learning algorithms"; the idea is to avoid being pigeonholed into C++-only land right off the bat, as mlpack has bindings to other languages

C++ should not be the centre, as in some cases could be pushing people away, even though I agree that in my opinion, the ultimate objective of mlpack is to focus on places where other library fails to deliver, especially when it comes to embedded system C/C++

Yes, I agree with this. I thought about it for a little while---I want to keep it clear that we are header-only, but if you remove the C++ then the term doesn't make sense... so I instead added a second line to the subtitle: with bindings to Python, Julia, R, Go, and the command-line. Hopefully that should cover the bases here.

Suggest to change the two blurbs under the logo (separated by a vertical line) into a simple bullet point list without the vertical divider

Yeah, maybe less visually interesting but certainly easier to read. Done.

For the citation section, suggest to use only one paper (the latest JOSS article from 2023), and remove the older ones; they're distracting and redundant

Yeah, fair point, removed.

Suggest to remove or simplify the Benchmarks section; the table is too dense and has too many details; at the very least reduce the table down to 3 or 4 rows maximum; suggest to also remove the Shogun and WEKA columns -- these projects are pretty much dead and including them is just distracting

I think we should move this to a different place, and Yes I agree on removing Weka and Shogun, I never heard of mlpy or used it. We should replace these with TensorFlow or TensorFlow Lite, Pytorch, or any other modern maintained framework. Also, we should mention some benchmarks regarding the binary footprint.

Well, I agree with both of those things, but I don't have any other benchmark numbers handy and want to avoid diving into getting benchmarking set up again... I also want to have some kind of proof point for "fast" on the homepage, even with the clear problems that have been pointed out and even though the numbers are almost old enough that they could get a driver's license in the US. Anyway, I want to update these for sure, but I'd prefer to leave the cleanup for a follow-up and come back to it.

Suggest to change the "Download mlpack" button to link to the "Quickstart" page, instead of directly pointing to the .tar.gz archive; also remove mlpack version details, in order to reduce the maintenance burden; more on that below

Ah, thanks! I had meant to do that but apparently forgot to actually make the change. Done now.

Quickstart page: Suggest to rename this section to "Installation" or "Download"

Yes, good point, that will help prevent confusion between the "quickstarts" which are documentation and this page. I went with "Download".

Suggest to have a separate section with "Latest Stable Release", linking to mlpack-4.3.0.tar.gz. This is instead of stating "mlpack is available in most package managers, or the source can be downloaded directly" Suggest to add a note along these lines: "Pre-built mlpack packages are provided for various operating systems; these packages may not be the latest version; if you're encountering problems, use the official stable release provided here"

Thanks, I did my best to refactor the download page to mention these.

I would remove the FAQ page because everything on that page has been already said before, and it does not add any value to the library, only maintenance burden.

All of the questions are obvious to us who know the project already, yeah, but a lot of people may be unfamiliar with these questions. I do get a decent number of emails from people who do not appear to have much expertise and would benefit from these pages. It doesn't hurt to keep the page there; I think none of the answers have been updated in a very long time anyway. ensmallen, Armadillo, and Bandicoot have similar pages.

If It is up to me I would remove the entire contributor section from the community, it is not well-updated, and there are so many names, that I doubt anyone would come to see what this is about.

The only thing that kept me from doing this is that some contributors from very long ago never had their contributions linked with a Github account. But I guess that maybe they are called out in COPYRIGHT.txt or something like this. Anyway, I removed it.

I would simplify the page community, there is so much information, that it is hard to find the needed one

With the contributors list gone, it's a lot shorter now. Really the thing that most people will be interested in is the chat channels and maybe video call; the rest of the stuff is not particularly important but does need to go somewhere. That contribution guide is really helpful too---I point lots of people towards it and it saves me time writing the same thing over and over again. :)

I would remove the word Developer from in front of the names, it feels too much academic, even all of us have left academia long time ago 😄

Yeah, I think the NSF wanted it written like that many years ago. I dropped the "Developer" part, even though I do still dip my toes in academia from time to time. :)

We should remove the tutorial button from the front page because it is leading to some very old thing. We can bring it back with modern tutorials.

Where do you see the link to the tutorials? Maybe I am looking at the wrong page? I don't see one on the new homepage. I thought I had removed all links to the old tutorials.

We should add a page that talks about the Ecosystem of mlpack, (for example adding link to ensmallen, bandicoot, armadillo, and other libraries, that are using mlpack. Not only this, we can add old and current users, and what are they doing with the library, something like use cases, which is more relevant to people and attractive as well.

Not a bad idea, but I would prefer to save it for later, or if you want to do it, feel free.

I really like Pytorch website, I think we should get some inspiration from there, regarding Edge dedicated page, and adding a Blog page, since we are going to add these in the future.

Definitely we will need more direct links to the application notes and tutorials we will write, but I'm hesitant about adding a blog. Unless you have a team of people dedicated to writing blog content, it will always go out of date and then users who come look at the blog say "huh, it hasn't been updated in 3 years, I guess the project must have no momentum". I finally removed the old GSoC blog we used to use for this reason.

I really like the About page in Pytorch as well, including the installation

Actually the current version of the mlpack homepage has an installation selector kind of like PyTorch does, but I have found it hard to maintain and a little bit complicated. So I just went with something simple. If someone wanted to replace it with something better or more elaborate in the future, I'm not opposed, but I would prefer not to (I enjoy writing C++ not HTML/JS :)).

Finally, we do not need a link to the Home page with the home button, everyone is clicking on the logo, it is the easiest and the fastest way

I see what you mean, but it doesn't hurt anything to leave it there. I personally prefer the explicit callout to the home page.

rcurtin avatar May 13 '24 14:05 rcurtin

further comments in https://github.com/mlpack/mlpack.org/pull/61#issuecomment-2109102456

conradsnicta avatar May 14 '24 01:05 conradsnicta

@shrit - may need to approve this PR as well, since it's part of https://github.com/mlpack/mlpack.org/pull/61

conradsnicta avatar May 20 '24 07:05 conradsnicta

The documentation build fails, but only because the new website is not yet deployed. Otherwise all the links seem ok.

rcurtin avatar May 26 '24 11:05 rcurtin