clickhouse-docs icon indicating copy to clipboard operation
clickhouse-docs copied to clipboard

Document all ClickHouse functions.

Open johnnymatthews opened this issue 1 year ago • 4 comments

Problem

There are many functions in ClickHouse, but not all are documented. See https://github.com/ClickHouse/clickhouse-docs/issues/1682 https://github.com/ClickHouse/clickhouse-docs/issues/1512 as examples. This issue serves as the main holding-issue for many child/sub-issues. Essentially, if you want to know which functions have recently had documentation added, you should treat this issue as the source-of-truth.

Solution

Document 100% of the functions in ClickHouse.

Process

  1. List all the functions that exist in ClickHouse.
  2. List all the functions that have docs.
  3. Create a list of functions that don't have docs.
  4. Loop through this new list to write a description and syntax example for each function.
    1. To get this info, check the tests that each function was written with.
    2. Check in with the author of each function to get a background on why it was created.
  5. Repeat step 4 until all the functions have docs.

Scope

List each function, what it does, and a syntax example.

Example

Here is a markdown example of what a function should look like the following (a copy of this is available in the clickhouse/clickhouse repository at /docs/en/sql-reference/functions/_function-template.md):

## Example

This function serves as an example of what function documentation should look like.

**Syntax**

```sql
Example(input, secondary_input)
```

**Parameters**

- `input`: This is an example input. [String literal](../syntax#syntax-string-literal)
- `secondary_input`: This is another example input. [Expression](../syntax#syntax-expressions)

**Returned value**

This function doesn't exists. But if it did, they it would return a bool.

**Implementation details**

Any additional information pertaining to this function that users might need to know about. Things like known-bugs or edge cases would fit into this section. For example, if a function used _bankers rounding_ instead of standard rounding.

**Example**

Query:

```sql
CREATE TABLE example_table
(
    id UUID,
    name String,
) ENGINE = MergeTree
ORDER BY id;

INSERT INTO example_table VALUES (generateUUIDv4(), Example());
SELECT * FROM example_table;
```

```response
9e22db8c-d343-4bc5-8a95-501f635c0fc4	Example
```

Things to note:

  1. The arguments must end in a valid type and link to that type within the docs.
  2. The example query must be valid and complete; a user should be able to copy and paste the query directly into fiddle.clickhouse.com and have the query run without any modifications.
  3. The example result must be similar to what the user can expect to see when running it in fiddle.clickhouse.com. Some functions, such as generateUUIDv4(), will return a random string; as such, you shouldn't try to have your example response match what the user might see.

Out of scope

We will not:

  • Create a fancy new way of searching through functions.
  • Expand on or update existing function docs.
  • Create in-depth guides or tutorials for each function.
  • Add docs directly into clickhouse/clickhouse code.
  • Showcase specific functions.

These items are valid points to improve the docs, but they're entirely separate projects that require planning and setup of their own.

Timeline

This project should be completed in Q1 2024. So, by the end of March 2024.

johnnymatthews avatar Jan 09 '24 15:01 johnnymatthews

After going through all the available ClickHouse functions, and searching through the docs for appropriate documentation, I've concluded that we have 449 functions without adequate documentation. See this Google Sheet. Check the Info tab for details on how the existing docs were judged.

449 function is a lot, especially given the size of the docs team. Without significant crowd-sourcing methods, it's unlikely that we'll get all of these missing functions documented by the end of March 2024.

johnnymatthews avatar Jan 23 '24 16:01 johnnymatthews

Here's a complete list of the missing functions:

  • [x] FROM_UNIXTIME

See #1919.


  • [x] JSON_QUERY
  • [x] JSON_VALUE

See #1920.


  • [x] L2SquaredNorm

See #1921.


  • [x] TimeDiff

See #1922.


  • [x] UTC_timestamp

See #1923.


  • [ ] VAR_POP
  • [ ] VAR_SAMP

See #1924.


  • [ ] YEAR

See #1925.


  • [x] __bitBoolMaskAnd
  • [x] __bitBoolMaskOr
  • [x] __bitSwapLastTwo
  • [x] __bitWrapperFunc

See #1926. Won't be documented, these functions are only for internal usage.


  • [ ] __getScalar

See #1927.


  • [ ] addTupleOfIntervals

See #1928.


  • [x] aggThrow

See #1929.


  • [x] analysisOfVariance

See #1930.


  • [ ] and

See #1931.


  • [x] anova

See #1932.


  • [x] anyLast_respect_nulls
  • [x] any_respect_nulls
  • [x] any_value
  • [x] any_value_respect_nulls

See #1933.


  • [x] array
  • [x] arrayDotProduct
  • [x] arrayEnumerateDenseRanked
  • [x] arrayEnumerateUniqRanked
  • [x] arrayFirstOrNull
  • [x] arrayFlatten
  • [x] arrayLastOrNull
  • [x] arrayPartialShuffle
  • [x] arrayShuffle

See #1934.


  • [ ] caseWithExpr
  • [ ] caseWithExpression
  • [ ] caseWithoutExpr
  • [ ] caseWithoutExpression

See #1935.


  • [x] connectionId
  • [x] connection_id

See #1936.


  • [x] corrMatrix
  • [x] corrStable

See #1937.


  • [x] countMatchesCaseInsensitive
  • [x] countSubstringsCaseInsensitive
  • [x] countSubstringsCaseInsensitiveUTF8

See #1938.


  • [ ] covarPopMatrix
  • [ ] covarPopStable
  • [ ] covarSampMatrix
  • [ ] covarSampStable

  • [x] cramersVBiasCorrected

See #1940.


  • [ ] currentSchemas
  • [ ] current_schemas

See #1941.


  • [x] cutToFirstSignificantSubdomainCustomRFC
  • [x] cutToFirstSignificantSubdomainCustomWithWWWRFC
  • [x] cutToFirstSignificantSubdomainRFC
  • [x] cutToFirstSignificantSubdomainWithWWWRFC

See #1942.


  • [ ] dense_rank

See #1943.


  • [x] detectProgrammingLanguage
  • [x] detectTonality

See #1944.


  • [x] dictGet
  • [x] dictGetDate
  • [x] dictGetDateOrDefault
  • [x] dictGetDateTime
  • [x] dictGetDateTimeOrDefault
  • [x] dictGetFloat32
  • [x] dictGetFloat32OrDefault
  • [x] dictGetFloat64
  • [x] dictGetFloat64OrDefault
  • [x] dictGetIPv4
  • [x] dictGetIPv4OrDefault
  • [x] dictGetIPv6
  • [x] dictGetIPv6OrDefault
  • [x] dictGetInt16
  • [x] dictGetInt16OrDefault
  • [x] dictGetInt32
  • [x] dictGetInt32OrDefault
  • [x] dictGetInt64
  • [x] dictGetInt64OrDefault
  • [x] dictGetInt8
  • [x] dictGetInt8OrDefault
  • [x] dictGetString
  • [x] dictGetStringOrDefault
  • [x] dictGetUInt16
  • [x] dictGetUInt16OrDefault
  • [x] dictGetUInt32
  • [x] dictGetUInt32OrDefault
  • [x] dictGetUInt64
  • [x] dictGetUInt64OrDefault
  • [x] dictGetUInt8
  • [x] dictGetUInt8OrDefault
  • [x] dictGetUUID
  • [x] dictGetUUIDOrDefault

See #1918.


  • [x] displayName

See #1945.


  • [x] domainRFC
  • [x] domainWithoutWWWRFC

See #1946.


  • [x] e

See #1947.


  • [x] emptyArrayFloat32
  • [x] emptyArrayFloat64
  • [x] emptyArrayInt16
  • [x] emptyArrayInt32
  • [x] emptyArrayInt64
  • [x] emptyArrayInt8
  • [x] emptyArrayString
  • [x] emptyArrayToSingle
  • [x] emptyArrayUInt16
  • [x] emptyArrayUInt32
  • [x] emptyArrayUInt64
  • [x] emptyArrayUInt8

See #1948.


  • [x] exp
  • [x] exponentialTimeDecayedAvg
  • [x] exponentialTimeDecayedCount
  • [x] exponentialTimeDecayedMax
  • [x] exponentialTimeDecayedSum

See #1949.


  • [ ] extract
  • [ ] extractGroups

See #1950.


  • [ ] filesystemUnreserved

See #1951.


  • [x] firstSignificantSubdomainCustomRFC
  • [x] firstSignificantSubdomainRFC
  • [x] first_value_respect_nulls

See #1953.


  • [x] flameGraph

See #1954.


  • [x] flattenTuple

See #1955.


  • [x] formatQueryOrNull
  • [x] formatQuerySingleLineOrNull

See #1956.


  • [x] fromUnixTimestamp64Micro
  • [x] fromUnixTimestamp64Milli

See #1957.


  • [x] fullHostName

See #1958.


  • [ ] getSubcolumn
  • [ ] getTypeSerializationStreams

See #1959.


  • [ ] globalIn
  • [ ] globalInIgnoreSet
  • [ ] globalNotIn
  • [ ] globalNotInIgnoreSet
  • [ ] globalNotNullIn
  • [ ] globalNotNullInIgnoreSet
  • [ ] globalNullIn
  • [ ] globalNullInIgnoreSet
  • [ ] globalVariable

See #1960.


  • [ ] greater

See #1961.


  • [x] hasColumnInTable
  • [x] hasSubsequenceCaseInsensitive
  • [x] hasSubsequenceCaseInsensitiveUTF8
  • [x] hasSubsequenceUTF8
  • [x] hasThreadFuzzer
  • [x] hasToken
  • [x] hasTokenCaseInsensitive
  • [x] hasTokenCaseInsensitiveOrNull
  • [x] hasTokenOrNull

See #1962.


  • [ ] ignore

See #1963.


  • [ ] in
  • [ ] inIgnoreSet

See #1964.


  • [ ] initcap

See #1965.


  • [ ] intHash32
  • [ ] intHash64

See #1966.


  • [x] isNotDistinctFrom
  • [x] isNullable

See #1967.


  • [ ] joinGetOrNull

See #1968.


  • [x] kostikConsistentHash

See #1969.


  • [ ] kql_array_sort_asc
  • [ ] kql_array_sort_desc

See #1970.


  • [ ] lagInFrame

See #1971.


  • [ ] last_value_respect_nulls

See #1972.


  • [x] lcase

See #1973.


  • [ ] leadInFrame

See #1974.


  • [x] left
  • [x] leftUTF8

See #1975.


  • [x] lengthUTF8

See #1976.


  • [ ] less
  • [ ] lessOrEquals

See #1977.


  • [x] lowCardinalityIndices
  • [x] lowCardinalityKeys

See #1978.


  • [x] makeDate32
  • [x] makeDateTime64

See #1979.


  • [ ] mapPartialReverseSort
  • [ ] mapPartialSort

See #1980.


  • [ ] materialize

See #1981.


  • [ ] maxMappedArrays
  • [ ] minMappedArrays

See #1982.


  • [ ] moduloLegacy

See #1983.


  • [x] mortonDecode
  • [x] mortonEncode

See #1984.


  • [x] multiSearchAllPositionsCaseInsensitive
  • [x] multiSearchAllPositionsCaseInsensitiveUTF8
  • [x] multiSearchAllPositionsUTF8
  • [x] multiSearchAnyCaseInsensitive
  • [x] multiSearchAnyCaseInsensitiveUTF8
  • [x] multiSearchAnyUTF8
  • [x] multiSearchFirstIndexCaseInsensitive
  • [x] multiSearchFirstIndexCaseInsensitiveUTF8
  • [x] multiSearchFirstIndexUTF8
  • [x] multiSearchFirstPositionCaseInsensitive
  • [x] multiSearchFirstPositionCaseInsensitiveUTF8
  • [x] multiSearchFirstPositionUTF8

See #1985.


  • [x] ngramDistanceCaseInsensitive
  • [x] ngramDistanceCaseInsensitiveUTF8
  • [x] ngramDistanceUTF8
  • [x] ngramSearchCaseInsensitive
  • [x] ngramSearchCaseInsensitiveUTF8
  • [x] ngramSearchUTF8

See #1986.


  • [ ] nonNegativeDerivative

See #1987.


  • [x] normL2Squared

See #1988.


  • [ ] normalizeQueryKeepNames
  • [ ] normalizedQueryHashKeepNames

See #1989.


  • [ ] notInIgnoreSet
  • [ ] notNullIn
  • [ ] notNullInIgnoreSet

See #1990.


  • [ ] nth_value

See #1991.


  • [ ] nullIn
  • [ ] nullInIgnoreSet

See #1992.


  • [ ] or

See #1993.


  • [ ] partitionId

See #1994.


  • [x] portRFC

See #1995.


  • [x] proportionsZTest

See #1996.


  • [x] protocol

See #1997.


  • [x] rand
  • [x] rand32
  • [x] rand64
  • [x] randCanonical

See #1998.


  • [ ] rank

See #1999.


  • [x] readWKTPoint
  • [x] readWKTRing

See #2000.


  • [x] regionHierarchy
  • [x] regionIn
  • [x] regionToArea
  • [x] regionToCity
  • [x] regionToContinent
  • [x] regionToCountry
  • [x] regionToDistrict
  • [x] regionToName
  • [x] regionToPopulation
  • [x] regionToTopContinent

See #2001.


  • [x] reinterpretAsDate
  • [x] reinterpretAsDateTime
  • [x] reinterpretAsFixedString
  • [x] reinterpretAsFloat32
  • [x] reinterpretAsFloat64
  • [x] reinterpretAsInt128
  • [x] reinterpretAsInt16
  • [x] reinterpretAsInt256
  • [x] reinterpretAsInt32
  • [x] reinterpretAsInt64
  • [x] reinterpretAsInt8
  • [x] reinterpretAsString
  • [x] reinterpretAsUInt128
  • [x] reinterpretAsUInt16
  • [x] reinterpretAsUInt256
  • [x] reinterpretAsUInt32
  • [x] reinterpretAsUInt64
  • [x] reinterpretAsUInt8
  • [x] reinterpretAsUUID

See #2002.


  • [x] reverseDNSQuery

See #2003.


  • [x] revision

See #2004.


  • [x] right
  • [x] rightUTF8

See #2005.


  • [x] roundAge
  • [x] roundDown
  • [x] roundDuration
  • [x] roundToExp2

See #2006.


  • [x] rowNumberInAllBlocks
  • [x] rowNumberInBlock

See #2007.


  • [ ] row_number

See #2008.


  • [x] showCertificate

See #2009.


  • [x] sigmoid

See #2010.


  • [x] simpleJSONExtractBool
  • [x] simpleJSONExtractFloat
  • [x] simpleJSONExtractInt
  • [x] simpleJSONExtractRaw
  • [x] simpleJSONExtractString
  • [x] simpleJSONExtractUInt
  • [x] simpleJSONHas

See #2011.


  • [x] sin

See #2012.


  • [x] singleValueOrNull

See #2013.


  • [x] sleep
  • [x] sleepEachRow

See #2014.


  • [x] stddevPop
  • [x] stddevPopStable
  • [x] stddevSamp
  • [x] stddevSampStable

See #2015.


  • [x] substringIndexUTF8
  • [x] substringUTF8

See #2016.


  • [x] subtractDays
  • [x] subtractHours
  • [x] subtractInterval
  • [x] subtractMicroseconds
  • [x] subtractMilliseconds
  • [x] subtractMinutes
  • [x] subtractMonths
  • [x] subtractNanoseconds
  • [x] subtractQuarters
  • [x] subtractSeconds
  • [x] subtractTupleOfIntervals
  • [x] subtractWeeks
  • [x] subtractYears

See #2017.


  • [x] sum
  • [x] sumMapFiltered
  • [x] sumMapFilteredWithOverflow
  • [x] sumMapWithOverflow
  • [x] sumWithOverflow

See #2018.


  • [x] svg

See #2019.


  • [x] tanh

See #2020.


  • [x] timeSlot
  • [x] timeSlots

See #2021.


  • [ ] toBool
  • [ ] toDateTime64OrDefault
  • [ ] toDateTime64OrNull
  • [ ] toDateTime64OrZero
  • [ ] toDecimal128
  • [ ] toDecimal128OrDefault
  • [ ] toDecimal128OrNull
  • [ ] toDecimal128OrZero
  • [ ] toDecimal256
  • [ ] toDecimal256OrDefault
  • [ ] toDecimal256OrNull
  • [ ] toDecimal256OrZero
  • [ ] toDecimal32
  • [ ] toDecimal32OrDefault
  • [ ] toDecimal32OrNull
  • [ ] toDecimal32OrZero
  • [ ] toDecimal64
  • [ ] toDecimal64OrDefault
  • [ ] toDecimal64OrNull
  • [ ] toDecimal64OrZero
  • [ ] toDecimalString
  • [ ] toFixedString
  • [ ] toFloat32
  • [ ] toFloat32OrDefault
  • [ ] toFloat32OrNull
  • [ ] toFloat32OrZero
  • [ ] toFloat64
  • [ ] toFloat64OrDefault
  • [ ] toFloat64OrNull
  • [ ] toFloat64OrZero
  • [ ] toIPv4OrDefault
  • [ ] toIPv4OrNull
  • [ ] toIPv4OrZero
  • [ ] toIPv6OrDefault
  • [ ] toIPv6OrNull
  • [ ] toIPv6OrZero
  • [ ] toISOWeek
  • [ ] toISOYear
  • [ ] toInt128
  • [ ] toInt128OrDefault
  • [ ] toInt128OrNull
  • [ ] toInt128OrZero
  • [ ] toInt16
  • [ ] toInt16OrDefault
  • [ ] toInt16OrNull
  • [ ] toInt16OrZero
  • [ ] toInt256
  • [ ] toInt256OrDefault
  • [ ] toInt256OrNull
  • [ ] toInt256OrZero
  • [ ] toInt32
  • [ ] toInt32OrDefault
  • [ ] toInt32OrNull
  • [ ] toInt32OrZero
  • [ ] toInt64
  • [ ] toInt64OrDefault
  • [ ] toInt64OrNull
  • [ ] toInt64OrZero
  • [ ] toInt8
  • [ ] toInt8OrDefault
  • [ ] toInt8OrNull
  • [ ] toInt8OrZero
  • [ ] toIntervalDay
  • [ ] toIntervalHour
  • [ ] toIntervalMicrosecond
  • [ ] toIntervalMillisecond
  • [ ] toIntervalMinute
  • [ ] toIntervalMonth
  • [ ] toIntervalNanosecond
  • [ ] toIntervalQuarter
  • [ ] toIntervalSecond
  • [ ] toIntervalWeek
  • [ ] toIntervalYear
  • [ ] toRelativeDayNum
  • [ ] toRelativeHourNum
  • [ ] toRelativeMinuteNum
  • [ ] toRelativeMonthNum
  • [ ] toRelativeQuarterNum
  • [ ] toRelativeSecondNum
  • [ ] toRelativeWeekNum
  • [ ] toRelativeYearNum
  • [ ] toStartOfMicrosecond
  • [ ] toStartOfMillisecond
  • [ ] toStartOfNanosecond
  • [ ] toTime
  • [ ] toTimezone
  • [ ] toTypeName
  • [ ] toUInt128
  • [ ] toUInt128OrNull
  • [ ] toUInt128OrZero
  • [ ] toUInt16
  • [ ] toUInt16OrDefault
  • [ ] toUInt16OrNull
  • [ ] toUInt16OrZero
  • [ ] toUInt256
  • [ ] toUInt256OrDefault
  • [ ] toUInt256OrNull
  • [ ] toUInt256OrZero
  • [ ] toUInt32
  • [ ] toUInt32OrDefault
  • [ ] toUInt32OrNull
  • [ ] toUInt32OrZero
  • [ ] toUInt64
  • [ ] toUInt64OrDefault
  • [ ] toUInt64OrNull
  • [ ] toUInt64OrZero
  • [ ] toUInt8
  • [ ] toUInt8OrDefault
  • [ ] toUInt8OrNull
  • [ ] toUInt8OrZero
  • [ ] toUnixTimestamp64Micro
  • [ ] toUnixTimestamp64Milli

See #2022.


  • [x] today

See #2023.


  • [x] topLevelDomainRFC

See #2024.


  • [x] transactionID
  • [x] transactionLatestSnapshot
  • [x] transactionOldestSnapshot

See #2025.


  • [x] translateUTF8

See #2026.


  • [x] trunc
  • [x] truncate

See #2027.


  • [x] tryBase58Decode
  • [x] tryBase64Decode

See #2028.


  • [ ] tumbleEnd
  • [ ] tumbleStart

See #2029.


  • [x] tupleIntDiv
  • [x] tupleIntDivByNumber
  • [x] tupleIntDivOrZero
  • [x] tupleIntDivOrZeroByNumber
  • [x] tupleModulo
  • [x] tupleModuloByNumber

See #2030.


  • [x] ucase

See #2031.


  • [x] uniqCombined64

See #2032.


  • [x] upper
  • [x] upperUTF8

See #2033.


  • [x] uptime

See #2034.


  • [x] validateNestedArraySizes

See #2035.


  • [x] varPopStable
  • [x] varSampStable

See #2036.


  • [x] version

See #2037.


  • [x] visitParamExtractBool
  • [x] visitParamExtractFloat
  • [x] visitParamExtractInt
  • [x] visitParamExtractRaw
  • [x] visitParamExtractString
  • [x] visitParamExtractUInt
  • [x] visitParamHas

See #2038.


  • [ ] windowID

See #2039.


  • [x] wkt

See #2040.


  • [x] wyHash64

See #2041.


  • [x] yandexConsistentHash

See #2042.

johnnymatthews avatar Jan 23 '24 16:01 johnnymatthews

@johnnymatthews I'd like to help out with this in between working on some code related contributions. Feel free to assign tasks to me, or alternatively I can just start from the top of the list.

Blargian avatar Mar 01 '24 15:03 Blargian

Hey @Blargian, sorry for the delay in getting back to you. I've just seen a few comment and PRs from yourself, so thanks for that! If you're still interested in helping out, just keep plugging away at the issues. If you'd like specific issues to be assigned to yourself, just leave a comment in the issue saying something like "I'll work on this one" -- I'll assign it to you then :)

johnnymatthews avatar Mar 12 '24 13:03 johnnymatthews