dataframe icon indicating copy to clipboard operation
dataframe copied to clipboard

KDocs fixes for `distinct` and `distinctBy`

Open Allex-Nik opened this issue 2 weeks ago • 3 comments

Fixes #1434

The documentation for the distinct function with parameters described the functionality of distinctBy: it suggested that distinct removes duplicated rows omitting the fact that it also selects the specified columns and the result contains only these columns.

In this PR I:

  • Fixed this issue by changing descriptions of distinct
  • Added distinctBy to DocumentationUrls
  • Made some other minor fixes

Allex-Nik avatar Dec 08 '25 19:12 Allex-Nik

@Allex-Nik To avoid this problem with [columns] you can write [\columns] instead. We use this trick in many places.

I'd even say it's better to write [\columns] in such places always.

AndreiKingsley avatar Dec 09 '25 09:12 AndreiKingsley

Specified the parameter explicitly for every function to avoid incorrect resolution of the columns parameter mentioned in DistinctDocs

TL;DR: write it like [columns\]

Long explanation:

Whenever some KDocs is @included, all references mentioned in that doc are expanded to their fully qualified path, if possible. This solves the issue that a reference to [a] in one doc is not necessarily resolvable as [a] in another doc. But it may be resolvable as [a][path.to.a].

If a reference cannot be found, it's left unchanged as [a].

Unfortunately, this system is not perfect (resolving symbols by path in kotlin myself is hard XD), so when you write

/** @param [columns] The names of the columns to consider for evaluating distinct rows. */
interface DistinctDocs

/** @include [DistinctDocs] */
public fun <T> DataFrame<T>.distinctBy...

KoDEx tries to find [columns] in the scope of DistinctDocs, finds it, and expands it to [columns][org.jetbrains.kotlinx.dataframe.columns] before including it at distinctBy().

Luckily, we have the \ escape character :) which allows us to stop KoDEx from doing "clever" things. These are removed from the KDocs in the last phase. These allow you to 'break' tags like \@inlude X, or \$something, or references like \[columns] so they aren't processed anymore :). (You can put the \ anywhere in the reference I believe, actually)

Jolanrensen avatar Dec 11 '25 11:12 Jolanrensen

TL;DR: write it like [columns\]

Long explanation:

@Jolanrensen, thank you! I understand :)

Allex-Nik avatar Dec 15 '25 10:12 Allex-Nik