streaming
streaming copied to clipboard
Azure Databricks MDS write ops in error: MapInPandas write_mds gives message Spark higher-order functions are not supported in Unity Catalog
Environment
Azure Databricks 14.3 LTS used on a distributed cluster.
To reproduce
Steps to reproduce the behavior:
- Code as follow:
d = {
'code': ['test', 'test', 'test'],
'age': ['test', 'test', 'test'],
'id': ['ib4', '7h!', '67h']
}
df_test = ps.DataFrame(data=d).to_spark()
mds_kwargs = {
'out': 'test',
'columns': {
'code': 'str',
'age': 'str',
'id': 'str',
},
'keep_local': True
}
dataframe_to_mds(
dataframe=df_test,
merge_index=True,
mds_kwargs=mds_kwargs
)
Expected behavior
The script should result in a write op that writes the created DataFrame on the Unity Catalog / local persistence in the MDS format.
Additional context
Documentation is not accessible today, we are using version 0.7.x .
This is not a bug on the streaming side most likely. I seen it before.
Can you try using a non server less compute?
Hi @XiaohanZhangCMU , sorry I'm not sure I understand. Could you please be more specific, what do you mean by a non server less compute ? I used a regular DBK distributed cluster.