dataframe [Feature] OpenAPI/Swagger JSON type schema support

As described by https://github.com/Kotlin/dataframe/issues/142

This PR contains a custom OpenAPI -> DF Marker conversion and implementation in the Gradle- and KSP plugin. There are also a lot of tests, but I'm not yet 100% confident I've caught every edge case (there are many...). Docs are updated too. Let me know if there are any major things (or minor ones, all are welcome) that need changing or if you have some more testing ideas.

Sep 30 '22 14:09 Jolanrensen

okay, I looked over additionalProperties... https://swagger.io/docs/specification/data-models/dictionaries/ Let's see how I can add that

Oct 03 '22 13:10 Jolanrensen

Since I changed so much I'm gonna provide a small overview of the changes I made. Might make it easier to review. I'll go through the changed files:

Linting changes (sorry, but improves readability)
Docs for OpenAPI and jsonoptions
replaced all occurrences of "splitted" with "split" because English
updated kotlinDatetime and made it api() for jupyter etc.
ImportDataSchema gained jsonOptions with typeClashTactic and keyValuePaths
starting with kdoc in some functions (more will come later)
convertTo
- convertIf in dsl: way more powerful than convert(KType, KType), allows you to specify via a condition function whether you want to do a certain conversion and provides fromType and toSchema in ConverterScope. convert still has priority over convertIf
- convertTo can now happen for any empty df (both 0 rows or 0 columns)
- manual converters can now happen for any type of target column, not just value columns. user conversion happens before df conversion making some conversions that were previously impossible possible.
- Value columns of datarows can become column groups, same as value columns with all nulls.
- value columns of dataframes can become frame columns
- absent columns can be created iff it's a nullable typed value column, DataRow<Something?> for a group column or frame column.
containsNoData() helper function (created that once but don't use it anymore, might still be useful? opinion needed)
enums can implement DataSchemaEnum to control how they are (en/de)coded from/to datasets instead of using just their name
code generation can now generate enums and type aliases too
if a generated interface contains a reference to another @DataSchema, the type will no longer be wrapped in a DataRow since that unneccesary.
DefaultReadDfMethod now provides actual Marker instead of just a name
nullability helpers for FieldType
bugfix for MarkersExtractor regarding nullableProperties
contains in BaseColumn is now operator fun
Bugfix for concat of DataFrames without columns (rows must still count up, else schema conversions will break)
String+Number can become Comparable, this was always the case but order dependent. That now always gives the same result thanks to a bugfix in commonParents() and commonType()
ISO_DATE_TIME support in parser from string
CodeGenerator.Companion.urlReader is now split into urlDfReader and urlCodeGenReader where the former is used to generate a dataframe from the data and types from that, while the latter directly generates types (for openapi)
Similar split for SupportedFormat into SupportedDataFrameFormat and SupportedCodeGenerationFormat
createColumn()
- bugfix for empty iterables always making column groups. Now uses guesstype or creates value column
- bugfix for iterables with just nulls always making a frame column, now also uses guesstype or creates value column
- can now create Column group with iterable of datarows too
small fix in printing newlines of data schemas
createEmptyColumn and createEmptyDataFrame variants with numberOfRows
ColumnSchema, aside from type now also has contentType for extra type info for Group- and Frame columns (instead of just getting DataRow<*> or DataFrame<*>. Very useful for new convertIf method. intersectSchema also tries to merge these contentTypes if possible. extractSchema will make them Any?.
toSnakeCase() helper function
json reader:
- distinction in guess.kt between normal json and openapi json
- typeClashTactics ANY_COLUMNS and ARRAY_AND_VALUE_COLUMNS (original and default)
- keyValuePaths where using given JsonPath, the objects will be read as DataFrame<KeyValueProperty>
openapi:
- can decode json/yaml openapi 3.0 type specifications and produce CodeWithConverter with all DataSchema interfaces, type aliases and enums, as well as readJson functions which automatically fill in keyValuePaths and other conversions necessary.
- objects with just additionalProperties will become key/value dataframes.
- objects with properties & additionalProperties will ignore additionalProperties
importDataSchema() function for Jupyter for SupportedCodeGenerationFormats like openapi
DSL for JupyterConfiguration
bugfix for resolution overload ambiguity in DataFrame.get(vararg IntRage)
examples for openapi and keyvalue, both in jupyter and normal kt files
JsonOptions in KSP and Gradle plugin
tests Edit:
Values in columns are now only "listified" if suggestedType is given as list, not automatically anymore. Also adds optional listifyValues argument to guessValueType() and baseType()

Nov 10 '22 12:11 Jolanrensen