sqlite_orm
sqlite_orm copied to clipboard
Reverse engineering complex databases - automatically generate storage definition and data structures
I have databases with many complex tables (lots of columns, foreign keys, range checks etc). Manually creating the storage definition and all the structs would take maybe a week. It is also likely that errors are made in the process.
Is there some kind of parser that generates the structs and the storage definition from an existing SQLite db?
If not, I am tempted to write a SQLite3 -> sqlite_orm definition parser that reads sqlite_master and outputs a .h file with all you need in order to use the db with sqlite_orm.
Hi. Thanks for using the lib. How many tables do you have? I am just curious.
I thought about codegen from SQL query. I want to make a website where you can type an SQL query and press generate and you would get C++ code written with sqlite_orm. E.g. you type:
CREATE TABLE contacts (
contact_id INTEGER PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE,
phone TEXT NOT NULL UNIQUE
);
will generate
struct Contact {
int id = 0;
std::string firstName;
std::string lastName;
std::string email;
std::string phone;
};
auto storage = make_storage("db.sqlite",
make_table("contacts",
make_column("contact_id", &Contact::id, primary_key()),
make_column("first_name", &Contact::firstName),
make_column("last_name", &Contact::lastName),
make_column("email", &Contact::email, unique()),
make_column("phone", &Contact::phone, unique())));
or if you type
SELECT first_name, salary
FROM emp_master;
you will get
auto rows = storage.select(columns(&EmpMaster::firstName, &EmpMaster::salary));
What do you think?
There should be somewhere around 70 tables (2 different schema definitions). Many of them have several dozens of columns.
The results you describe are exactly what is needed (although it seems that NOT NULL is ignored in the example). That would make sqlite_orm even more efficient to use and attractive!
I suppose that when it comes to details (e.g. the determining the right order of the make_table entities in the presence of foreign keys), one needs to be careful.
May I ask how you are planning to implement this transformation?
- parse a query with sqlite vanilla parser
- analyze model obtained by parsing
- generate code or show error with text "this feature is not supported right now"
Sounds good! What priority does this code generation thing have for you?
- Bugs
- Minor features
- Codegen
- Major features
I will need to have the tables converted in about 2 weeks. Do you think the code generator might be ready then? If not, no problem, then I'll write a simple object oriented parser just for 'create table'.
Probably it will be ready for create table
Awesome, thank you so much!
Quick question: How are types translated into sqlite_orm, e.g. UNSIGNED INT(4), CHARACTER(1), VARCHAR(64) etc.? In the case of UNSIGNED INT(4) => int, information is lost so that the databases created by sqlite_orm are not identical to the original ones.
And another Question: here...
struct User{
...
std::unique_ptr<std::string> imageUrl;
...
};
...unique_ptr means that imageUrl = nullptr is allowed. Correct?
- https://github.com/fnc12/sqlite_orm/blob/master/include/sqlite_orm/sqlite_orm.h#L164
- yes, nullable types skip
NOT NULLconstraint during syncing a schema. Nullable types arestd::unique_ptr,std::shared_ptrandstd::optional. You can specify any other type as nullable in your application
Thanks! In the meantime I have written a quick and dirty code generator based on the SQLite C/C++ API. It can already create the structs. Next, I'll do the make_storage part. I can make the code available on github if you like.
This is cool! But it is not required cause I have already started making injections in sqlite source to make available statement analyzing. I thought about creating custom parser but native parser is always better cause it will parse 100% of queries but non-native will parse <100% queries and must be regularly supported. Anyway parser creation is a good exercise to get amazing experience!
Generated Code:
The compiler says I must use the command line option /bigobj in order to get it to compile. I am using /bigobj but it is still not compiling (Error C1128) :-(
Ok, I put /bigobj into the linker command line options instead of the compiler. Obviously I need a break but it seems that I am almost there. Takes long to compile though.
Ok, I can create an sqlite3 db using sqlite_orm and the generated definition (103 KB of code, I am happy that I did not have to write this manually). Now I need to have a look at the foreign keys. But maybe I should call it day for now :-)
I am stuck with the data type mapping. Seems like it is not possible to create a db definition for sqlite_orm without losing the detailed data type specification of the SQLite db to be modelled. That might be a big dealbreaker because then the Host software is most likely going to refuse to read dbs created with sqlite_orm...
I need to exactly recreate the existing SQLite db. Is this possible?
How does sqlite_orm deal with circular foreign key references? I just discovered that the schema I am currently working with has 7 circular references...
@prospectumdev what will be lost?
sqlite_orm cannot handle circular references. This is a reason to improve sync_schema call
what will be lost?
The data type definitions as specified in the original databases. For instance, 'UNSIGNED INT(4)' is mapped to 'INT', so that databases created with sqlite_orm will always differ from original ones. However, contrary to my expectations it seems that the host software I am using does not care. So it is not a big problem.
sqlite_orm cannot handle circular references
In my tables there are self references and circular references. is it possible / are you planning to support these references? As for timing, is this considered a minor or a major feature?
Plus it seems that it is not possibly to specify validity checks via the check constraint, like CHECK(CustomDisplay1_PosPixels_Y=0 OR CustomDisplay1_PosPixels_Y BETWEEN 0 AND 8192). The databases I am using heavily rely on checks. Again, the host software does not seem to care but I know from experience that there is a good reason for all those checks to be there.
From my perspective, my assessment of these three issues is the following:
- Circular references: Important feature.
- Checks: Important feature.
- Non-SQLite data type names: Seems to be more of a nice-to-have feature.
This is a reason to improve sync_schema call
So you are planning to add support for circular references and self references? That is great news!
What about check constraints? If sqlite_orm is supposed to do the checking itself, this might be a lot of work. However, why not simply add the check constraints to the database and have SQLite itself do the checking? In that case it should be much easier to add this feature.
As for the data type names: It seems that SQLite allows for data type name aliases to be stored. While I am not sure if it is a good practice to use those at all, SQLite does it and so, for completeness, it might be nice to represent that, too. Or sqlite_orm could explicitly refuse to use data types other than those specified by the 5 storage classes and rely on checks in order to ensure validity. While this is a plausible approach, my experience from product development tells me that for psychological reasons this is likely to have a (small) detrimental effect on the user acceptance of sqlite_orm.
- If you have a database with a column with type
UNSIGNED INT(4)and a member mapped to this column with typeintthe column will never be altered duringsync_schemacauseUNSIGNED INT(4)maps toint - circular references are delayed until someone asks for it. Looks like it is time to implement this feature
- CHECK constraint is also delayed until someone asks I don't see the problem in different types' names cause names are only aliases. I think I shall implement CHECK constraint first, circular references next and SQL parse after that. I am afraid I can not make it in two weeks so you should use you own parser probably.
Thanks a lot, I'll find a way to get something going that I can work with at the moment. If the host software accepts databases without foreign keys and checks, that's ok for now.
I think you are doing an excellent job with sqlite_orm, and I think it can become very popular once the features are there and it is convenient to use (example: code generation). May I ask you about your plans for sqlite_orm? Is it a private or professional project for you? Are you planning to develop/support it long term?
Feedback: It kind of works: I can now read an existing database, generate the sqlite_orm header file (no foreign keys, no check() constraints) and make sqlite_orm produce a working database. 'working means' that I can insert all the data of the preexisting database into the new one and the new one is accepted by the host software! Unfortunately, when I try to open an existing database with sqlite_orm, it is completely overwritten. The preserve flag (sync_schema(true)) does not help. So it seems that I will have to transfer the data manually at the moment.
Data is lost cause something in schema differs. Please show me the results of sync_schema call
My quick and dirty code: https://github.com/prospectumdev/200111_sqlite_orm_codegen
I'll post the results of sync_schema shortly.
sync_schema() returns "old table dropped and recreated" for all tables sync_schema(true) also returns "old table dropped and recreated" for all tables
I found the reason: It is happening because of the data type names. For testing, I modified a small table, setting data type names to SQLite standards, while leaving all other tables alone. Now, for the modified table, the output is "table and storage is already in sync."
Maybe custom data type name support isn't a 'nice to have' feature at all...
Optional data type aliases could solve the problem. Then
make_column("CustomDisplay1_PosPixels_X", &Coupler::CustomDisplay1_PosPixels_X, default_value(0))
turns into
make_column("CustomDisplay1_PosPixels_X", &Coupler::CustomDisplay1_PosPixels_X, "UNSIGNED INT(6)", default_value(0))
If available, the aliases can be used to set table_info[columnid].type when tables are created.
what type has Coupler::CustomDisplay1_PosPixels_X?
original db : UNSIGNED INT(6)
new db created with sqlite_orm: INTEGER
struct created by the code generator: unique_ptr<unsigned int>
what C++ type has Coupler::CustomDisplay1_PosPixels_X or what type you expect it to have? int, std::string or what?