dataframe icon indicating copy to clipboard operation
dataframe copied to clipboard

[Feature] OpenAPI/Swagger JSON type schema support

Open Jolanrensen opened this issue 3 years ago • 1 comments

As described by https://github.com/Kotlin/dataframe/issues/142

This PR contains a custom OpenAPI -> DF Marker conversion and implementation in the Gradle- and KSP plugin. There are also a lot of tests, but I'm not yet 100% confident I've caught every edge case (there are many...). Docs are updated too. Let me know if there are any major things (or minor ones, all are welcome) that need changing or if you have some more testing ideas.

Jolanrensen avatar Sep 30 '22 14:09 Jolanrensen

okay, I looked over additionalProperties... https://swagger.io/docs/specification/data-models/dictionaries/ Let's see how I can add that

Jolanrensen avatar Oct 03 '22 13:10 Jolanrensen

Since I changed so much I'm gonna provide a small overview of the changes I made. Might make it easier to review. I'll go through the changed files:

  • Linting changes (sorry, but improves readability)
  • Docs for OpenAPI and jsonoptions
  • replaced all occurrences of "splitted" with "split" because English
  • updated kotlinDatetime and made it api() for jupyter etc.
  • ImportDataSchema gained jsonOptions with typeClashTactic and keyValuePaths
  • starting with kdoc in some functions (more will come later)
  • convertTo
    • convertIf in dsl: way more powerful than convert(KType, KType), allows you to specify via a condition function whether you want to do a certain conversion and provides fromType and toSchema in ConverterScope. convert still has priority over convertIf
    • convertTo can now happen for any empty df (both 0 rows or 0 columns)
    • manual converters can now happen for any type of target column, not just value columns. user conversion happens before df conversion making some conversions that were previously impossible possible.
    • Value columns of datarows can become column groups, same as value columns with all nulls.
    • value columns of dataframes can become frame columns
    • absent columns can be created iff it's a nullable typed value column, DataRow<Something?> for a group column or frame column.
  • containsNoData() helper function (created that once but don't use it anymore, might still be useful? opinion needed)
  • enums can implement DataSchemaEnum to control how they are (en/de)coded from/to datasets instead of using just their name
  • code generation can now generate enums and type aliases too
  • if a generated interface contains a reference to another @DataSchema, the type will no longer be wrapped in a DataRow since that unneccesary.
  • DefaultReadDfMethod now provides actual Marker instead of just a name
  • nullability helpers for FieldType
  • bugfix for MarkersExtractor regarding nullableProperties
  • contains in BaseColumn is now operator fun
  • Bugfix for concat of DataFrames without columns (rows must still count up, else schema conversions will break)
  • String+Number can become Comparable, this was always the case but order dependent. That now always gives the same result thanks to a bugfix in commonParents() and commonType()
  • ISO_DATE_TIME support in parser from string
  • CodeGenerator.Companion.urlReader is now split into urlDfReader and urlCodeGenReader where the former is used to generate a dataframe from the data and types from that, while the latter directly generates types (for openapi)
  • Similar split for SupportedFormat into SupportedDataFrameFormat and SupportedCodeGenerationFormat
  • createColumn()
    • bugfix for empty iterables always making column groups. Now uses guesstype or creates value column
    • bugfix for iterables with just nulls always making a frame column, now also uses guesstype or creates value column
    • can now create Column group with iterable of datarows too
  • small fix in printing newlines of data schemas
  • createEmptyColumn and createEmptyDataFrame variants with numberOfRows
  • ColumnSchema, aside from type now also has contentType for extra type info for Group- and Frame columns (instead of just getting DataRow<*> or DataFrame<*>. Very useful for new convertIf method. intersectSchema also tries to merge these contentTypes if possible. extractSchema will make them Any?.
  • toSnakeCase() helper function
  • json reader:
    • distinction in guess.kt between normal json and openapi json
    • typeClashTactics ANY_COLUMNS and ARRAY_AND_VALUE_COLUMNS (original and default)
    • keyValuePaths where using given JsonPath, the objects will be read as DataFrame<KeyValueProperty>
  • openapi:
    • can decode json/yaml openapi 3.0 type specifications and produce CodeWithConverter with all DataSchema interfaces, type aliases and enums, as well as readJson functions which automatically fill in keyValuePaths and other conversions necessary.
    • objects with just additionalProperties will become key/value dataframes.
    • objects with properties & additionalProperties will ignore additionalProperties
  • importDataSchema() function for Jupyter for SupportedCodeGenerationFormats like openapi
  • DSL for JupyterConfiguration
  • bugfix for resolution overload ambiguity in DataFrame.get(vararg IntRage)
  • examples for openapi and keyvalue, both in jupyter and normal kt files
  • JsonOptions in KSP and Gradle plugin
  • tests Edit:
  • Values in columns are now only "listified" if suggestedType is given as list, not automatically anymore. Also adds optional listifyValues argument to guessValueType() and baseType()

Jolanrensen avatar Nov 10 '22 12:11 Jolanrensen