smart-data-lake icon indicating copy to clipboard operation
smart-data-lake copied to clipboard

HiveTableConnection: pathPrefix should be optional

Open Geheiner opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe. Often you only need a connection for reading data. In that case, you should not need to define pathPrefix.

Describe the solution you'd like Make pathPrefix optional so that it can be ommited for read-only connections. If the user then tries to write something with a connection that has pathPrefix=None, SDLB should abort. It has the additional benefit to enforce that the connection is readonly. Right now if you have a user that would be allowed to write there is no safeguard to prevent you from writing to a table by mistake. This could also be used as such a safeguard

Additional context The entrypoint for this change is here: https://github.com/smart-data-lake/smart-data-lake/blob/develop-spark3/sdl-core/src/main/scala/io/smartdatalake/workflow/connection/HiveTableConnection.scala#L37

Geheiner avatar Jun 10 '22 12:06 Geheiner

Good idea. We can also think about an attribute (general, not only for HiveTable) to define a DataObject as read-only. Doing it through pathPrefix is a bit implicit and limited to this DataObject.

pgruetter avatar Jun 10 '22 12:06 pgruetter