prql icon indicating copy to clipboard operation
prql copied to clipboard

`include` other prql files & module system

Open Baoqi opened this issue 3 years ago • 5 comments
trafficstars

Currently, seems we only have 1 stdlib.prql

which will be included when parse every other prql file. maybe we can make this "include" as a feature.

so, I can define some custom functions in separate prql file:

func day_of_week col -> s' ......'
func date_trunc  col unit -> s'.....'

while in the main prql, I can first include them, and then use those functions:

prql include:datetime_util

from table1
derive[ a = day_of_week(datetime_col)]

we may support specify library search path for different dialects, like:

--- current folder
    --   datetime_util.prql (default one)
    --   mysql folder
         --- datetime_util.prql (overload one)

Baoqi avatar Jul 01 '22 07:07 Baoqi

Good point.

As we will also add database schema definitions in PRQL #381 , one would probably want to have them in separate files and included in each of the query files.

But this would require some kind of "module" system with hierarchical namespaces. Again (as many things) I like Rust approach to modules:

  • each file is a module,
  • each directory with mod.rs is a module
  • any file can also contain inner modules by using mod { ... } keyword.

Each module can then have annotations (for example #[cfg(test)]) that includes this module only in test configuration. We could replace such annotations with:

# default impl
func day_of_week col -> s' ......'

mod dialect:mysql { 
  # overloaded impl
  func day_of_week col -> s' ......'
}

aljazerzen avatar Jul 01 '22 10:07 aljazerzen

What about filename based dialecting so that the primary implementation doesn't need to be altered when new dialects are added. Essentially something like how React Native handles it, which is they allow platform dependent code by using filename patterns.

A directory structure like this might be nice?

[root]
   |-[my-lib]
   |      |-my-lib.prql             # Default implementation
   |      |-my-lib.postgres.prql    # PostgreSQL version
   |      |-my-lib.mysql.prql       # MySQL version
   |-[other-lib]
   |      |

And then allowing for an import which can use folders and resolve the modules.

prql dialect:postgres

import my-lib
....

Module resolution would then search through that folder and given the dialect is already chosen would take precedence for the "my-lib.postgres.prql" file by using the suffix in the filename.

Since the dialect options are enumerations anyways seems that can be the proper suffix.

And as a note, filenames may not even be needed if a directory based structure is used. Can just do my-lib/postgres.prql and my-lib/generic.prql if wanted.

chris-pikul avatar Jul 13 '22 03:07 chris-pikul

@chris-pikul I think that's a great idea!

max-sixty avatar Jul 13 '22 21:07 max-sixty

Is there a specific implementation plan and timetable for this?

songlinshu avatar Aug 12 '22 02:08 songlinshu

We don't have a specific plan, but very open to contributions towards this.

(I would like to get more people involved in development and have been thinking of ways of making the current codebase more approachable, so possibly this is a good case)

I quite like @chris-pikul 's proposal. I think a smaller version could be built non-invasively over the current compiler with something that collected the files and basically concatenated them together. That could run from the CLI without even any import statement; this could be managed from the original command.

Then it would be a modest change to add the import my-lib functionality (though it would require some intermediate rust work since that can't be in the wasm target, which doesn't support files).

Somewhat relatedly, I've also done lots of work on the dbt integration (https://github.com/prql/dbt-prql/), which is a very viable way of building bigger projects of queries (more so than functions though)[^1].

If anyone is interested in exploring this, I & others are very happy to help, hit us up here or on Discord!

[^1]: One note: it currently only works for databases that use backticks for identifiers — e.g. BigQuery — something that I've been trying to engineer around, and discussing with the folks from dbt about).

max-sixty avatar Aug 12 '22 19:08 max-sixty

Ref #2129 #2567 #2570

aljazerzen avatar May 15 '23 12:05 aljazerzen