koalas icon indicating copy to clipboard operation
koalas copied to clipboard

pandas fixed width file support

Open johnayoub opened this issue 4 years ago • 8 comments

Is there a plan to support the fixed width api ?

johnayoub avatar Jul 26 '21 19:07 johnayoub

Koalas is now in Apache Spark officially. Let's file an issue there. From a cursory look, looks like we can implement it by 1. distributing input StringIO, 2. reading any file from the distributed file source.

HyukjinKwon avatar Jul 27 '21 00:07 HyukjinKwon

cc @xinrong-databricks and @itholic since you guys are triaging the issues.

HyukjinKwon avatar Jul 27 '21 00:07 HyukjinKwon

Thanks @HyukjinKwon. let me know if you need me to open an issue there.

johnayoub avatar Jul 27 '21 14:07 johnayoub

@johnayoub Sure, can you open an issue to Apache Spark JIRA ?

itholic avatar Jul 30 '21 02:07 itholic

@itholic opened a new issue there.

johnayoub avatar Aug 03 '21 13:08 johnayoub

@itholic @HyukjinKwon any update on this and when I can expect it to be included with koalas?

johnayoub avatar Aug 12 '21 00:08 johnayoub

Hi, @johnayoub

Unfortunately, we have no clear plan to add read_fwf yet (at least it's available after Spark 3.3 or later)

Anyway, at least it will be added to the PySpark first, and added to the Koalas after then. (So, we'd recommend to use PySpark rather than Koalas since Koalas is now in maintenance mode)

FYI, you can easily convert your Koalas code to PySpark with single line change as below:

# import databricks.koalas as ks
import pyspark.pandas as ks

btw, just in case, maybe if you want to read files from http, it will take longer since PySpark doesn't support reading from such file sources yet. Refer to https://github.com/databricks/koalas/issues/1219 for more detail about http support.

itholic avatar Aug 12 '21 01:08 itholic

thanks @itholic!

The import that you mentioned isn't supported yet is it?

import pyspark.pandas as ks

Also, in the meantime any recommendation for dealing with fixed width format in spark?

johnayoub avatar Aug 13 '21 04:08 johnayoub