sqlite_orm Reverse engineering complex databases - automatically generate storage definition and data structures

I have databases with many complex tables (lots of columns, foreign keys, range checks etc). Manually creating the storage definition and all the structs would take maybe a week. It is also likely that errors are made in the process.

Is there some kind of parser that generates the structs and the storage definition from an existing SQLite db?

If not, I am tempted to write a SQLite3 -> sqlite_orm definition parser that reads sqlite_master and outputs a .h file with all you need in order to use the db with sqlite_orm.

Jan 11 '20 10:01 prospectumdev

Hi. Thanks for using the lib. How many tables do you have? I am just curious. I thought about codegen from SQL query. I want to make a website where you can type an SQL query and press generate and you would get C++ code written with sqlite_orm. E.g. you type:

CREATE TABLE contacts (
    contact_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE,
    phone TEXT NOT NULL UNIQUE
);

will generate

struct Contact {
    int id = 0;
    std::string firstName;
    std::string lastName;
    std::string email;
    std::string phone;
};
auto storage = make_storage("db.sqlite",
                            make_table("contacts",
                                       make_column("contact_id", &Contact::id, primary_key()),
                                       make_column("first_name", &Contact::firstName),
                                       make_column("last_name", &Contact::lastName),
                                       make_column("email", &Contact::email, unique()),
                                       make_column("phone", &Contact::phone, unique())));

or if you type

SELECT first_name, salary 
FROM emp_master;

you will get

auto rows = storage.select(columns(&EmpMaster::firstName, &EmpMaster::salary));

What do you think?

Jan 11 '20 15:01 fnc12

There should be somewhere around 70 tables (2 different schema definitions). Many of them have several dozens of columns.

The results you describe are exactly what is needed (although it seems that NOT NULL is ignored in the example). That would make sqlite_orm even more efficient to use and attractive!

I suppose that when it comes to details (e.g. the determining the right order of the make_table entities in the presence of foreign keys), one needs to be careful.

May I ask how you are planning to implement this transformation?

Jan 11 '20 16:01 prospectumdev

parse a query with sqlite vanilla parser
analyze model obtained by parsing
generate code or show error with text "this feature is not supported right now"

Jan 12 '20 07:01 fnc12

Sounds good! What priority does this code generation thing have for you?

Jan 12 '20 12:01 prospectumdev

Bugs
Minor features
Codegen
Major features

Jan 12 '20 13:01 fnc12

I will need to have the tables converted in about 2 weeks. Do you think the code generator might be ready then? If not, no problem, then I'll write a simple object oriented parser just for 'create table'.

Jan 13 '20 07:01 prospectumdev

Probably it will be ready for create table

Jan 13 '20 08:01 fnc12

Awesome, thank you so much!

Jan 13 '20 08:01 prospectumdev

Quick question: How are types translated into sqlite_orm, e.g. UNSIGNED INT(4), CHARACTER(1), VARCHAR(64) etc.? In the case of UNSIGNED INT(4) => int, information is lost so that the databases created by sqlite_orm are not identical to the original ones.

Jan 13 '20 14:01 prospectumdev

And another Question: here...

struct User{
	...
	std::unique_ptr<std::string> imageUrl;
	...
};

...unique_ptr means that imageUrl = nullptr is allowed. Correct?

Jan 13 '20 14:01 prospectumdev

https://github.com/fnc12/sqlite_orm/blob/master/include/sqlite_orm/sqlite_orm.h#L164
yes, nullable types skip NOT NULL constraint during syncing a schema. Nullable types are std::unique_ptr, std::shared_ptr and std::optional. You can specify any other type as nullable in your application

Jan 13 '20 15:01 fnc12

Thanks! In the meantime I have written a quick and dirty code generator based on the SQLite C/C++ API. It can already create the structs. Next, I'll do the make_storage part. I can make the code available on github if you like.

Jan 13 '20 15:01 prospectumdev

This is cool! But it is not required cause I have already started making injections in sqlite source to make available statement analyzing. I thought about creating custom parser but native parser is always better cause it will parse 100% of queries but non-native will parse <100% queries and must be regularly supported. Anyway parser creation is a good exercise to get amazing experience!

Jan 13 '20 17:01 fnc12

Generated Code:

The compiler says I must use the command line option /bigobj in order to get it to compile. I am using /bigobj but it is still not compiling (Error C1128) :-(

Jan 13 '20 17:01 prospectumdev

Ok, I put /bigobj into the linker command line options instead of the compiler. Obviously I need a break but it seems that I am almost there. Takes long to compile though.

Jan 13 '20 17:01 prospectumdev

Ok, I can create an sqlite3 db using sqlite_orm and the generated definition (103 KB of code, I am happy that I did not have to write this manually). Now I need to have a look at the foreign keys. But maybe I should call it day for now :-)

Jan 13 '20 17:01 prospectumdev

I am stuck with the data type mapping. Seems like it is not possible to create a db definition for sqlite_orm without losing the detailed data type specification of the SQLite db to be modelled. That might be a big dealbreaker because then the Host software is most likely going to refuse to read dbs created with sqlite_orm...

I need to exactly recreate the existing SQLite db. Is this possible?

Jan 13 '20 18:01 prospectumdev

How does sqlite_orm deal with circular foreign key references? I just discovered that the schema I am currently working with has 7 circular references...

Jan 13 '20 22:01 prospectumdev

@prospectumdev what will be lost? sqlite_orm cannot handle circular references. This is a reason to improve sync_schema call

Jan 14 '20 05:01 fnc12

what will be lost?

The data type definitions as specified in the original databases. For instance, 'UNSIGNED INT(4)' is mapped to 'INT', so that databases created with sqlite_orm will always differ from original ones. However, contrary to my expectations it seems that the host software I am using does not care. So it is not a big problem.

sqlite_orm cannot handle circular references

In my tables there are self references and circular references. is it possible / are you planning to support these references? As for timing, is this considered a minor or a major feature?

Plus it seems that it is not possibly to specify validity checks via the check constraint, like CHECK(CustomDisplay1_PosPixels_Y=0 OR CustomDisplay1_PosPixels_Y BETWEEN 0 AND 8192). The databases I am using heavily rely on checks. Again, the host software does not seem to care but I know from experience that there is a good reason for all those checks to be there.

From my perspective, my assessment of these three issues is the following:

Circular references: Important feature.
Checks: Important feature.
Non-SQLite data type names: Seems to be more of a nice-to-have feature.

This is a reason to improve sync_schema call

So you are planning to add support for circular references and self references? That is great news!

What about check constraints? If sqlite_orm is supposed to do the checking itself, this might be a lot of work. However, why not simply add the check constraints to the database and have SQLite itself do the checking? In that case it should be much easier to add this feature.

As for the data type names: It seems that SQLite allows for data type name aliases to be stored. While I am not sure if it is a good practice to use those at all, SQLite does it and so, for completeness, it might be nice to represent that, too. Or sqlite_orm could explicitly refuse to use data types other than those specified by the 5 storage classes and rely on checks in order to ensure validity. While this is a plausible approach, my experience from product development tells me that for psychological reasons this is likely to have a (small) detrimental effect on the user acceptance of sqlite_orm.

Jan 14 '20 08:01 prospectumdev

If you have a database with a column with type UNSIGNED INT(4) and a member mapped to this column with type int the column will never be altered during sync_schema cause UNSIGNED INT(4) maps to int
circular references are delayed until someone asks for it. Looks like it is time to implement this feature
CHECK constraint is also delayed until someone asks I don't see the problem in different types' names cause names are only aliases. I think I shall implement CHECK constraint first, circular references next and SQL parse after that. I am afraid I can not make it in two weeks so you should use you own parser probably.

Jan 14 '20 10:01 fnc12

Thanks a lot, I'll find a way to get something going that I can work with at the moment. If the host software accepts databases without foreign keys and checks, that's ok for now.

I think you are doing an excellent job with sqlite_orm, and I think it can become very popular once the features are there and it is convenient to use (example: code generation). May I ask you about your plans for sqlite_orm? Is it a private or professional project for you? Are you planning to develop/support it long term?

Jan 14 '20 10:01 prospectumdev

Feedback: It kind of works: I can now read an existing database, generate the sqlite_orm header file (no foreign keys, no check() constraints) and make sqlite_orm produce a working database. 'working means' that I can insert all the data of the preexisting database into the new one and the new one is accepted by the host software! Unfortunately, when I try to open an existing database with sqlite_orm, it is completely overwritten. The preserve flag (sync_schema(true)) does not help. So it seems that I will have to transfer the data manually at the moment.

Jan 14 '20 18:01 prospectumdev

Data is lost cause something in schema differs. Please show me the results of sync_schema call

Jan 14 '20 18:01 fnc12

My quick and dirty code: https://github.com/prospectumdev/200111_sqlite_orm_codegen

I'll post the results of sync_schema shortly.

Jan 14 '20 18:01 prospectumdev

sync_schema() returns "old table dropped and recreated" for all tables sync_schema(true) also returns "old table dropped and recreated" for all tables

I found the reason: It is happening because of the data type names. For testing, I modified a small table, setting data type names to SQLite standards, while leaving all other tables alone. Now, for the modified table, the output is "table and storage is already in sync."

Maybe custom data type name support isn't a 'nice to have' feature at all...

Jan 14 '20 19:01 prospectumdev

Optional data type aliases could solve the problem. Then

make_column("CustomDisplay1_PosPixels_X", &Coupler::CustomDisplay1_PosPixels_X, default_value(0))

turns into

make_column("CustomDisplay1_PosPixels_X", &Coupler::CustomDisplay1_PosPixels_X, "UNSIGNED INT(6)", default_value(0))

If available, the aliases can be used to set table_info[columnid].type when tables are created.

Jan 14 '20 20:01 prospectumdev

what type has Coupler::CustomDisplay1_PosPixels_X?

Jan 15 '20 06:01 fnc12

original db : UNSIGNED INT(6) new db created with sqlite_orm: INTEGER struct created by the code generator: unique_ptr<unsigned int>

Jan 15 '20 07:01 prospectumdev

what C++ type has Coupler::CustomDisplay1_PosPixels_X or what type you expect it to have? int, std::string or what?

Jan 15 '20 08:01 fnc12

sqlite_orm sqlite_orm copied to clipboard

Reverse engineering complex databases - automatically generate storage definition and data structures

sqlite_orm
sqlite_orm copied to clipboard