soda-sql icon indicating copy to clipboard operation
soda-sql copied to clipboard

Support SQLite3 as a data source for scanning in-memory data

Open vitormussa opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. In the context of ELT oriented projects, it is important to test data in every step of the pipeline. In this sense, if we have a pure Python step for extracting data from a data source, we should be able to test it before loading it into a database.

Describe the solution you'd like A possible solution is to support SQLite3, as it can be instantiated as an in-memory database with Python. It would be a better solution than using some other engine like Pandas that doesn't support SQL. Also, SQLite3 is a widely used database engine and has a built-in Python API. Although it is not recommended for production, it could fit as a storage for small datasets.

Additional context Instatiating a SQLite3 database in memory with Python is a simple task:

import sqlite3
conn = sqlite3.connect(':memory:')

Then we can run soda-sql scans against it to test the data before sending it downstream.

vitormussa avatar Jan 26 '22 18:01 vitormussa

Hey @vitormussa !

Thank you for opening the issue - I did start with a simple SQLite implementation sometime ago. There are some limitations though because by default SQLite doesn’t come with math functions, unless you change the compilation settings.

Feel free to check this branch https://github.com/sodadata/soda-sql/tree/sqlite-dialect and if you want to take a stab at it - I’ll be happy to help you with making it complete :slightly_smiling_face:

vijaykiran avatar Jan 27 '22 10:01 vijaykiran

Cool, @vijaykiran! I started looking at the other dialects and was wondering if I could start developing the SQLite one. I'll look at this branch and see what I can do then. Thanks for the fast answer :)

vitormussa avatar Jan 27 '22 11:01 vitormussa