dataframe issues

Add column name repairing for all IO sources as a common step for dataframe creation

2

The solution could be based on the PR #386

zaleslaw

enhancement

Prepare a survey (or GitHub Discussion) about data sources

12

The draft list of data sources: 1. SQL Databases based on JDBC 2. XML 3. Protobuf 4. Parquet 5. ORC 6. SparkSQL 7. different files on the FileSystem 8. NoSQL...

zaleslaw

research

Add support for PostgreSQL geometry types

- [ ] PGbox - [ ] PGcircle - [ ] PGpath

zaleslaw

enhancement

databases

Attempted to try out the new SQL support. I used the Amazon provided [JDBC driver](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html) and the connection oriented APIs. Redshift is a flavor of postgres and the driver should...

aajtodd

enhancement

databases

Documentation needed for Columns Selection DSL

Aside from merging https://github.com/Kotlin/dataframe/pull/372 we also need to update the columns selection page on the website. Need to look into getting the DSL grammar parts copied over neatly as well.

Jolanrensen

documentation

Meaningless warnings for some `KProperty<DataRow<C>>` extension functions: `CANDIDATE_CHOSEN_USING_OVERLOAD_RESOLUTION_BY_LAMBDA_ANNOTATION`

There are some functions introduced by https://github.com/Kotlin/dataframe/pull/372 which throw compiler/front-end warnings that can be ignored: `CANDIDATE_CHOSEN_USING_OVERLOAD_RESOLUTION_BY_LAMBDA_ANNOTATION`. This warning is caused by a workaround for [this YouTrack issue](https://youtrack.jetbrains.com/issue/KT-64092/OVERLOADRESOLUTIONAMBIGUITY-caused-by-lambda-argument) (and [this](https://youtrack.jetbrains.com/issue/KT-53104/Overload-resolution-ambiguity-introduced-when-adding-lambda-parameter)). ![image](https://github.com/Kotlin/dataframe/assets/17594275/0fe003f7-7007-4150-889b-6f6af4bea002)...

Jolanrensen

bug

invalid

Casting strings to double using `with { it.toDouble()}` and `toDouble()` gives different results

4

## Reproduce 1. Take the ramen dataset: https://www.kaggle.com/code/sujan97/complete-analysis-of-ramen-ratings/input 2. ```kotlin val df = DataFrame.readCSV("ramen-ratings.csv").renameToCamelCase() ``` ```kotlin df.filter { !stars.startsWith("Un") }.convert { stars }.toDouble() ``` 3. convert `stars` column to a...

devcrocod

bug

readJson with keyValuePaths parameter produces unexpected DataFrame

3

With this data https://covid.ourworldindata.org/data/owid-covid-data.json that is surprisingly a "wide" JSON i found that keyValuePaths can be helpful. I tried this: ``` val df = DataFrame.readJson( "https://covid.ourworldindata.org/data/owid-covid-data.json", keyValuePaths = listOf(JsonPath()) )...

koperagen

bug

invalid

Add support for percentiles

10

It appears that API already supports `median()` and `medianFor()` but not arbitrary percentiles. To make it on par with other DataFrame APIs it would desirable to have support for `percentile(percentile...

tklinchik

enhancement

research

dataframe
dataframe copied to clipboard

Metadata

Add column name repairing for all IO sources as a common step for dataframe creation

Prepare a survey (or GitHub Discussion) about data sources

Upgrade to JUnit 5

Add support for PostgreSQL geometry types

Redshift not supported

Documentation needed for Columns Selection DSL

Meaningless warnings for some `KProperty<DataRow<C>>` extension functions: `CANDIDATE_CHOSEN_USING_OVERLOAD_RESOLUTION_BY_LAMBDA_ANNOTATION`

Casting strings to double using `with { it.toDouble()}` and `toDouble()` gives different results

readJson with keyValuePaths parameter produces unexpected DataFrame

Add support for percentiles

← Metadata

Owner

Metadata

dataframe dataframe copied to clipboard

Metadata

← Metadata

Owner

Metadata

dataframe
dataframe copied to clipboard