dataframe
dataframe copied to clipboard
Structured data processing in Kotlin
The solution could be based on the PR #386
The draft list of data sources: 1. SQL Databases based on JDBC 2. XML 3. Protobuf 4. Parquet 5. ORC 6. SparkSQL 7. different files on the FileSystem 8. NoSQL...
- [ ] PGbox - [ ] PGcircle - [ ] PGpath
Attempted to try out the new SQL support. I used the Amazon provided [JDBC driver](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html) and the connection oriented APIs. Redshift is a flavor of postgres and the driver should...
Aside from merging https://github.com/Kotlin/dataframe/pull/372 we also need to update the columns selection page on the website. Need to look into getting the DSL grammar parts copied over neatly as well.
There are some functions introduced by https://github.com/Kotlin/dataframe/pull/372 which throw compiler/front-end warnings that can be ignored: `CANDIDATE_CHOSEN_USING_OVERLOAD_RESOLUTION_BY_LAMBDA_ANNOTATION`. This warning is caused by a workaround for [this YouTrack issue](https://youtrack.jetbrains.com/issue/KT-64092/OVERLOADRESOLUTIONAMBIGUITY-caused-by-lambda-argument) (and [this](https://youtrack.jetbrains.com/issue/KT-53104/Overload-resolution-ambiguity-introduced-when-adding-lambda-parameter)). data:image/s3,"s3://crabby-images/b1db6/b1db65bbcfe938a339a4e67d68e7b6ae57a9002f" alt="image"...
## Reproduce 1. Take the ramen dataset: https://www.kaggle.com/code/sujan97/complete-analysis-of-ramen-ratings/input 2. ```kotlin val df = DataFrame.readCSV("ramen-ratings.csv").renameToCamelCase() ``` ```kotlin df.filter { !stars.startsWith("Un") }.convert { stars }.toDouble() ``` 3. convert `stars` column to a...
With this data https://covid.ourworldindata.org/data/owid-covid-data.json that is surprisingly a "wide" JSON i found that keyValuePaths can be helpful. I tried this: ``` val df = DataFrame.readJson( "https://covid.ourworldindata.org/data/owid-covid-data.json", keyValuePaths = listOf(JsonPath()) )...
It appears that API already supports `median()` and `medianFor()` but not arbitrary percentiles. To make it on par with other DataFrame APIs it would desirable to have support for `percentile(percentile...