iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Reconsider handling of spaces in PartitionSpec$partitionToPath

Open aokolnychyi opened this issue 4 years ago • 3 comments

If we migrate existing tables to Iceberg and then continue writing to them, file locations are not consistent because of how we handle spaces in PartitionSpec$partitionToPath.

For example, we have the following locations:

/partitioned_table/data/dAtA sPaced=some+key+value/filename.parquet // Iceberg
/partitioned_table/data/dAtA sPaced=some key value/filename.parquet // Spark

aokolnychyi avatar Sep 18 '20 21:09 aokolnychyi

Should we just be not escaping?

      sb.append(field.name()).append("=").append(escape(valueString));

Because even if this was what we wanted it would be wrong for field names since we are only escaping values

RussellSpitzer avatar Sep 18 '20 22:09 RussellSpitzer

@RussellSpitzer @aokolnychyi

yeah, looks like this change can fix the issue. as URLEncoder converts space to +

Or it has to do decoding as encoded?

tested column_name field with spaces it works ( in case we create table using Iceberg internal API's) with Spark, it has to use back ticks `

I tested with no escape and it seems to be good for this test-case.

manishmalhotrawork avatar Sep 19 '20 03:09 manishmalhotrawork

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Feb 25 '24 00:02 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Mar 11 '24 00:03 github-actions[bot]