hyper-api-samples
hyper-api-samples copied to clipboard
add support for CREATE EXTERNAL TABLE
currently only temporary external tables are supported, it will be nice to remove that limitation please
From an implementation side, what you are asking for is straightforward. In fact, it was already discussed a couple of times internally. Unfortunately, there is more than just implementation to this feature. In particular, usability and security pose challenges.
The main issues currently are:
- If you move a
.hyper
file around, how does Hyper locate the external files? Through paths relative to the Hyper file? Absolute file paths? - Should there be a way to package external files together with the
.hyper
file? E.g., when uploading it to Tableau Cloud. - If you send a Hyper file via email, and some other person opens it, should Hyper read whichever external files are specified in the
.hyper
file? What if someone maliciously added anexternal table
which reads/etc/passwd
as an external CSV or some other sensitive data? - What if you upload a Hyper file to Tableau Cloud? Should that file be allowed to instruct Hyper to read
/etc/password
and display it as part of some visualization?
While the answer for /etc/password
is clearly a "no, this should not be allowed", it's hard to draw the line here
1- absolute Path 2- no, that fail the purpose of an external table, the data has to be in a shared storage 3- that's not Hyper fault if someone store sensitive data without encryption, moreover only the user can see it anyway, but I am not a security expert 4- an option in tableau cloud to block reading from internal data
my use case is reading parquet files from remote storage, which I think is a very common pattern those days with lakehouse and stuff :)
thanks a lot for your reply.
Agree with @djouallah, permanent external table (and views) is missing in Hyper. To answer your questions : 1 : both ( relative and absolute) 2 : external mean external so no packing external data into the hyper file 3 - limit external file extensions to csv, and parquet 4 - limit external file extensions should manage the problem. Limit the number of files in globs to 1000. Limit may be also on directories ( no /etc no /usr/,no /opt, no c:\windows, c:\program files....)
Do this security concerned cannot be blocked also by à security tool (edr or antivirus) on the Tableau cloud clusters ?
Object storage is great but there can be also fast parallel remote filesystems like pnfs or lustre that also provide excellent performance to access remote data....
recently DuckDB added an option to turn off reading from a local filesystem, I guess you guys can do the same for Tableau cloud, turn it off by default for security reason.