hop icon indicating copy to clipboard operation
hop copied to clipboard

[Feature Request]: Add "SQL from file" functionality to "Table Input" transform

Open gmitter-ef opened this issue 9 months ago • 12 comments

What would you like to happen?

Similar to the "SQL" action, a functionality to read SQL code from a file would be extremely helpful in the "Table Input" transform.

Currently we have outsourced SQL queries, and use the "Load file content in memory" in order to load the whole query into a variable (e.g. ${sql_file_content}), which is then the content of the SQL input field of the "Table Input" transform, with "Replace variables in script" checked.

This is very cumbersome, and being able to read the SQL query from the file system would help.

Issue Priority

Priority: 3

Issue Component

Component: Transforms, Component: Hop Gui

gmitter-ef avatar Apr 15 '25 13:04 gmitter-ef

We have the Execute row sql script transform for that one. Have you taken a look at that one?

hansva avatar Apr 15 '25 19:04 hansva

I think it is not usable for our usecase:

  • our SQL code contains a lot of variables, it seems to me that there is no way to enable "Use variables in script" or similarly
  • explicit assembling the path that contains the sql file is necessary, especially when the path is composed of variable components (concat transform with ${PROJECT_HOME} and other path-parts for example), this is way simpler in "Table input" transform
  • the actual result set of a SELECT query does not seem to be passed as a stream to the output of the transform (?)
  • docs say "For good performance turn to dedicated transforms like Table Output (insert into), Table Input (Select), Update, Delete, etc.": we are in high need of good performance as we read millions of rows

gmitter-ef avatar Apr 16 '25 06:04 gmitter-ef

it will require an additional pipeline, but what you could do is read your SQL query from a file ("Load file content in memory" transform) and pass the query as a parameter to a second pipeline. In that second pipeline, you can pick up the query that was passed to that pipeline as a parameter. Your query field would then a single variable, e.g. ${PRM_QUERY}.

bamaer avatar Apr 16 '25 08:04 bamaer

Thanks @bamaer, this is exactly what we already do (see description, unless I misunderstood?) , but we would like to avoid it; reading the code directly from file via the "Table input" transform seems to be a reasonable extension to us.

gmitter-ef avatar Apr 16 '25 08:04 gmitter-ef

right, missed that line in your initial post ;-)

bamaer avatar Apr 16 '25 08:04 bamaer

Another way would be to use the new variable resolvers, which can run a pipeline to resolve a variable. I am not against the idea of reading the SQL from a file I am just wondering what problem does that solve in this case?

The SQL is usually tightly coupled to the pipeline. Do you want to externalize it because of its reusability? Is it because of maintainability, because you write the SQL in a different tool and just want to point to the same file for both?

How do you solve the initialization of the variable during pipeline development? You will still have to copy and paste the query in the default variable during development, otherwise, you have no metadata from the table input.

hansva avatar Apr 17 '25 09:04 hansva

Hi! Some reasons why we used SQL files as outsourced files:

  • during development a proper source code editor / IDE can be used to do refactorings, renamings, alignments and the like
  • the SQL queries have to be adapted by consultants (WHERE conditions, hard-coded constants, additional conditions,...) at the customer's site as part of configuration tasks; we do not want them to edit the XMLs of the pipelines and we do not want them to need HOP to be able to adapt the queries
  • version control of the SQL files separately from the pipelines can prove handy

How do you solve the initialization of the variable during pipeline development? You will still have to copy and paste the query in the default variable during development, otherwise, you have no metadata from the table input.

Do you mean the variable that contains the whole query or the variables that are used by the query? Yes, this issue is problematic. For the variable that contains the query, yes, copy-and-pasting is needed, or setting the content of the query as parameter (without newlines). But exactly this problem I think could be solved by allowing to read directly from file?

gmitter-ef avatar Apr 22 '25 06:04 gmitter-ef

Alright, makes sense.

hansva avatar Apr 22 '25 06:04 hansva

What I forgot to explicitly state is that our main objective with this feature request is to be able to test pipelines in isolation as much as possible from the GUI, without having to run the whole project or parent/grandparent workflows. This is why we switched from using a variable that contains the SQL query (and gets read by the "Load file content in memory" in some previous pipeline) back to having the SQL code directly in the table input, taking into account the disadvantages (harder configurability of the queries, having to copy/paste back and forth in some code editor for easier editing/formatting,...).

gmitter-ef avatar Jul 22 '25 07:07 gmitter-ef

Hi @hansva, I tested whether I can use the variable resolver to read an SQL file and then use it directly in the Table Input step.

Image

However, it turns out that if the content of the SQL — meaning the statement itself — contains another variable, that variable is not resolved in the Table Input step.

The stream output

Image

replacement_value should be TEST-TEST as defined here

Image

Therefore, the SQL statements that are externalized into SQL files must not contain any additional variables.

Is this behavior intended, or should the variable be replaced in the Table Input step after being resolved?

dsanderbi avatar Sep 09 '25 13:09 dsanderbi

Just verified that this issue is still present in 2.16.0: using a variable resolver to paste in the content of a SQL file to the "Table input" works, but the variables used in the SQL code are still not resolved. In contrast to the variant where the result of a "Load file content to memory" without resolver is used in the table input.

gmitter-ef avatar Nov 20 '25 11:11 gmitter-ef

Hi all!

I found a way to use variables in the outsourced SQL files that get injected in the "Table input" transform via the "resolve-sql-file-input" resolver: Use yet another resolver ;-) in the outsourced SQL file. The resolver takes any variable name (here: "MY_VAR"), and a JavaScript returns the content of the variable with this name.

Resolver "resolve-variable" with relevant JavaScript code:

Image

Usage in the outsourced SQL code that gets injected into the "Table input" with the "resolve-sql-file-input" resolver:

Image

In contrast to directly write "${MY_VAR}" in the SQL code (which doesn't get resolved when the SQL file is injected via the first resolver), this way - using "resolve-variable" resolver - the variable gets correctly resolved!

gmitter-ef avatar Nov 21 '25 06:11 gmitter-ef