BetaML.jl icon indicating copy to clipboard operation
BetaML.jl copied to clipboard

Add MLJ-compliant document strings

Open ablaom opened this issue 3 years ago • 4 comments

We are currently implementing detailed docstrings for all MLJ models, following a standard we have developed. See this issue: https://github.com/alan-turing-institute/MLJ.jl/issues/913

@sylvaticus If it is helpful to you, @josephsdavid, who is helping us this summer as GSoD technical writer can prepare PRs for you to review. David is a working data scientist with some Julia knowledge. You will need to let me know soon if you would like this.

ablaom avatar Aug 29 '22 22:08 ablaom

That would be awesome, thanks. Also, it comes at a good time, as I did several changes to the package, and wrapped several other models to MLJ.

Just one thing... why you don't use DocStringExtensions ? It also has a nice template feature that looks great to make documentation homogeneous...

sylvaticus avatar Aug 30 '22 07:08 sylvaticus

Hi @ablaom and @sylvaticus,

The MLJ interface and implementation for clustering models is a bit rough. Some models like KMeans inherit from MLJ.Probabilistic and I can take the mode(output) as the final result. Other models like GMMClusterer do not inherit from MLJ.Probabilistic and I cannot know beforehand that they are probabilistic to call the mode(output).

What is the correct generic code to perform clustering with all available models in MLJ? How to make the final assignment of integer labels to samples in a model-agnostic manner?

juliohm avatar Sep 28 '22 18:09 juliohm

haven't look properly, but isn't it the opposite regarding kmeans and gmmclusterer?

sylvaticus avatar Sep 28 '22 18:09 sylvaticus

Yes, sorry for misplaced the models. KMeans is the deterministic one.

juliohm avatar Sep 28 '22 18:09 juliohm

I have now implemented a standard docstring for all MLJ models. Please feel free to correct and amend it.

sylvaticus avatar Oct 28 '22 21:10 sylvaticus

Thanks for that! It seems some MLJ-specific elements (and particular, Examples) are missing, but this is great improvement. (The full spec is here). Unfortunately, as BetaML uses DocStringExtensions.jl, this may be a bit awkward to integrate... Probably, we just append the missing stuff in the standard way. The ordering of elements will be a bit different, but I don't think that matters too much.

If @JosephsDavid get's time, he can take a look at it.

ablaom avatar Oct 31 '22 01:10 ablaom

Hy @ablaom , how can I check how the standard doc "renders" in the user case you have in mind ? It's not a problem to manually implement the docstring without DocStringExtensions... Conversely, I am not sure about the examples.. at the end they all follow the same API, so perhaps the examples should be in the API description rather than in each individual model.. I mean, I don't find a great added value in it..

sylvaticus avatar Oct 31 '22 08:10 sylvaticus

Hy @ablaom , how can I check how the standard doc "renders" in the user case you have in mind

It should suffice to query the ordinary doc string for the model, as in julia> @doc BetaML.Trees.RandomForestRegressor. It's possible the string won't get correctly recorded in the MLJ model registry, but only if you overload the trait MLJModelInterface.docstring (which has descr as alias). But I checked your code and I don't think that is the case.

The ultimate check is to do using MLJModels; doc("RandomForestRegressor", pkg="BetaML") but that won't work until the MLJ model registry is updated. After that the docstring is available to the MLJ user without loading code (eg, in a search such as models("Clusterer") to find all models with "Clusterer" in their docstring).

Conversely, I am not sure about the examples.. at the end they all follow the same API, so perhaps the examples should be in the API description rather than in each individual model.. I mean, I don't find a great added value in it..

I beg to differ. I think beginners really like an example for whatever model they decided to try out. Many of them don't have the depth of understanding required to generalize one example to the broader context. And some models have specific features worth demonstrating. It's okay to autogenerate these examples when they are basically the same each time (we did somewhere...). And the example can be pretty basic. You might find Generating synthetic data section of the manual helpful. And there are built-in datasets loaded with @load_iris, @load_boston, @load_crabs (binary) and @load_reduced_ames. There are now quite of few packages with MLJ compliant docstrings now you can mimic.

ablaom avatar Nov 01 '22 02:11 ablaom

Hello, I have added an Example session to all MLJ models.

sylvaticus avatar Nov 23 '22 09:11 sylvaticus

Wow. That's great. Ping me when you have a new release tagged with the changes and I'll update the MLJ model registry (and this issue can be closed).

ablaom avatar Nov 23 '22 21:11 ablaom

done it (v0.9.3).. The MLJ interfaced models documentation appears also in the BetaML documentation, e.g. here still.. if @juliohm has the time to review the documentation, would be great...

sylvaticus avatar Nov 24 '22 12:11 sylvaticus