mlpack icon indicating copy to clipboard operation
mlpack copied to clipboard

Buildable and testable documentation and tutorials

Open rcurtin opened this issue 6 years ago • 23 comments

Here are a couple of problems that we currently have:

  • Our documentation and tutorials often go out of date because parameters change and the tutorials don't get updated.
  • Our documentation doesn't have complete programs that a reader can just paste into a terminal, or Python interpreter, or .cpp file and run.
  • Our documentation doesn't have a good introduction to the kinds of problems mlpack can solve, and then a "flowchart" directing users to the kinds of algorithms they might want to solve a certain problem.
  • There's no guarantee that our code examples actually give the output that we claim for them to give.

I am planning to resolve this, so I wanted to open an issue for any discussion if anyone has any opinions. (If not I'll just do it! ;)) I'd like to use the automatic bindings system so that we write tutorials with generic code snippets for calls to the command-line programs or Python bindings or other bindings in them. These snippets:

  • will be written in C++, using the CLI::GetParam() function to set parameters
  • will define some tests (hopefully using the Boost Unit Test Framework) for the output
  • will have the ability to provide actual output values that can be substituted into the compiled documentation
  • will generate code usage examples for all binding languages that mlpack support, like the .NET documentation: https://docs.microsoft.com/en-us/dotnet/api/microsoft.win32.commondialog?view=netframework-4.7.2 (you can select the language in the upper right hand corner and it updates the code)
  • will be transformed from snippet to code via some CMake compilation into a program that outputs each of the snippets

I'm currently in the process of designing what the actual documentation will look like (in its "raw" form). There will also need to be some support for C++-only examples. When I have some examples, I'll post them here for discussion. I'm hopeful that some better documentation like this will (a) help people understand what mlpack does and how it can help them, (b) ensure that users don't get confused by out-of-date documentation, and (c) drive more traffic to mlpack since the documentation and tutorials contain ready-to-use examples that someone can just copy-paste and use (and then maybe modify a little bit).

It might be worth thinking about notebooks too at some point so someone can play with mlpack entirely in a browser without downloading anything or setting it up, but I want to keep the scope somewhat limited so this doesn't take me a year to do nicely. Maybe that can happen later. :)

If anyone's interested in helping (whether that's helping design, writing part of the code, or writing the revised documentation), I am sure that we can collaborate and work together on it! :)

rcurtin avatar Sep 20 '18 02:09 rcurtin

This is also related to #175.

rcurtin avatar Sep 20 '18 02:09 rcurtin

In my experience the mlpack documentation is too byzantine, ie. overly intricate and spread out over too many pages.

I believe having one "flat page" with internal links would go a long way to solving that issue. Having a single reference point (which is "keyword rich") is super useful for search engines, which increases incoming traffic.

It would also allow easy manual searching (eg. via control-f) and casual perusal, which is very useful for discovering the available algorithms, functions, classes, etc. This in turn would allow users to get a far better idea what mlpack is capable of.

conradsnicta avatar Sep 20 '18 06:09 conradsnicta

I can agree that the current documentation is too byzantine and too spread out. But I don't know if one "flat page" would work in the same way it does for Armadillo, since the mlpack techniques are a lot more generic and flexible, and often take a lot of explanation to fully describe what they do and what can be done with them. But I do like the idea at least of one "main page" that collects everything that mlpack does (whether that includes just the bindings to other languages or the C++ classes). Right now this is not the worst in the world but it is not so great for discoverability:

http://mlpack.org/docs/mlpack-3.0.3/python.html

rcurtin avatar Sep 20 '18 16:09 rcurtin

I like the flat page idea as well, in fact, I'm constantly using ctrl-f on the armadillo page. A stripped down version of the current cli output could work quite well to get something similar, and I think nothing hinders us to either hide certain information by default or link to another page with more information.

zoq avatar Sep 20 '18 17:09 zoq

Hm, ok. Let's see what I come up with. I'll see what I can fit on a flat page.

rcurtin avatar Sep 20 '18 18:09 rcurtin

The first mockup of what I've come up with is this:

http://www.mlpack.org/docs/experimental/python.html

Let me know what you think. I've only formatted the adaboost() documentation. I think for the bindings for other languages, a flat page is definitely more easy for a user to use---and to find different things that mlpack does. My intention is that in the "see also" links, we can direct users to the C++ documentation and also to the tutorials and examples that this issue was originally about. :)

rcurtin avatar Sep 26 '18 03:09 rcurtin

Do you think the >>> in the example code is something we should do; some users probably just do copy/paste, which doesn't work right out of the box, since you have to remove the prefix first. Really like the design behind the page.

zoq avatar Sep 26 '18 19:09 zoq

I could use css user-select: to make the >>> non-selectable---that might be a good way to do it. It's not guaranteed to work everywhere but it'll work in the vast majority of browsers and situations I think. Maybe we can assume that people using totally non-standard browsers realize they need to remove that bit ;)

rcurtin avatar Sep 26 '18 22:09 rcurtin

Could you please write a little about how the developers can configure it for their pc- Linux MAC or Windows or attach a link in it - http://www.mlpack.org/docs/experimental/python.html I faced a problem in Building Ml pack to Visual Studio in Windows in link http://www.mlpack.org/docs/mlpack-3.0.3/doxygen/build_windows.html that I don't want others to face. Thank you.

atulim avatar Sep 26 '18 22:09 atulim

@rcurtin - The mockup design looks good. It would be good to use a similar approach for the C++ docs.

The "nitty gritty" stuff with super-fine detail (ie. generated by doxygen?) could be linked separately from the C++ docs. In my experience the doxygen generated docs are at best only useful to the library developers, not actual users.

conradsnicta avatar Sep 27 '18 07:09 conradsnicta

Hi @atulim, let's keep the discussion for your issue in #1514 please. It's a little out of scope for us to provide basic documentation on how to set up your build environment---that can be found elsewhere.

@conradsnicta--- I think it would be hard to use a similar approach for C++, but I'll think about it. There are a lot more moving components and complexities in C++. Definitely the low-level doxygen details are hard to parse, even with the CSS styling I've done to try and make the templates a bit easier.

In any case, the next thing I'll do towards this end is work to auto-generate and finish the mockup that I built above and open a PR for that. Then we can start figuring out the rest...

rcurtin avatar Sep 27 '18 14:09 rcurtin

Looks great! A minor suggestion (feel free to ignore it): the high contrast use of white is somewhat cumbersome to read. Specifically, the code section also has a small font size which requires too much squinting of eyes. Maybe it's my overworked eyes, but I just wanted to let you know.

ajtejankar avatar Sep 30 '18 19:09 ajtejankar

Please find my input here under:

  1. I would completely rework the look and feel. I would choose something similar to this https://github.com/mlpack/mlpack/wiki/UsingCMake. It should be easy to read, easy to print and easy to generate as PDF. At the moment non of these three are true in my personal opinion. It is also important that the code examples can be easily copied, therefore only the necessary code should be in the box and nothing else.

  2. Examples with good to the point explanation are very important. I am using MS Windows and for this platform there should be one C++ template (console) example project which can be used to easily compile and run any example. The examples should only be one cpp file per example (no project file, no scripts, nothing else). The documentation can contain the example itself in a code box. If the code contains a lot of explanation (as it is the case with the most of the mlpack code) then the documentation will be easier (and maybe can be done automatically). I would not use the CLI. If the parameters are explained for each CLI app then anybody can set those parameters according to the cpp examples.

zsogitbe avatar Oct 03 '18 12:10 zsogitbe

Hi , I'm not sure this is the right place for this, but I think it the tutorials would be more understandable if the code showed also the includer files needed to run the examples. Very often one just wants to copy-paste the source code, try the example and then modify it, but actually it is not explicitly said what header files are needed to do that.

GLmontanari avatar Oct 11 '18 09:10 GLmontanari

Thanks for the suggestions everyone. I can reduce the contrast a bit by using a nicer color combination than #fff/#000, that's easy enough. Also, the "new" tutorials will have the full code needed (including includes) since they need to be "buildable", so that should make things easier.

rcurtin avatar Oct 11 '18 13:10 rcurtin

I would like to improve your documentations as I am currently reading your tutorials in which several are out of date or slightly off , due to which I am facing difficulty. How can I improve your documentations or certain part of code ?

atulim avatar Nov 18 '18 17:11 atulim

I just opened #1653, which can automatically generate something that looks like my mockup. Here it is in action: http://www.ratml.org/mlpack_test/docs.html. (ignore the ensmallen header at the top, I didn't bother mocking that up)

rcurtin avatar Jan 12 '19 03:01 rcurtin

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1:

mlpack-bot[bot] avatar Feb 18 '19 19:02 mlpack-bot[bot]

Hi! Is work still needed here? If so, I'm willing to help you improve the documentation. I recently stumbled onto MLPack through GSoC, and I like the idea and essence that the project carried.

What problems do we still have? What have the updates been since Feb 19, 2019? Where can I help? What should I know before contributing? I've read the CONTRIBUTING.md - is there anything else in the technical and technological side that I should know of?

Thank you!

Rubix982 avatar Apr 09 '20 14:04 Rubix982

Hey, would you be interested in plain .md documentation like this?

  • https://github.com/honkit/honkit
  • https://honkit.netlify.app/

rodonguyen avatar Jan 25 '23 03:01 rodonguyen

Yeah, it's likely that plain Markdown is the best way to go here. I am experimenting in PR #3350 towards this end, but there is still a lot of work to do. I'll get back to it when I have a chance. :+1:

rcurtin avatar Jan 26 '23 01:01 rcurtin

@rodonguyen feel free to collaborate with @rcurtin on this. @rcurtin what do you think?

shrit avatar Feb 06 '23 07:02 shrit

@rcurtin are you open to collaboration on this issue?

Rubix982 avatar Feb 06 '23 09:02 Rubix982

I don't mind collaborating, the only thing is, my focus is elsewhere right now. So it could be a while until I get back to #3350 and this! :)

rcurtin avatar Feb 08 '23 21:02 rcurtin