TomboloDigitalConnector
TomboloDigitalConnector copied to clipboard
Re-visiting the architecture of DC
Description
Consider an example of a table of size 6*8, where there are 8 attributes and 6 subjects. In current implementation we are saving combination of every subject with every attribute, which for a table this small does 48 operations instead of just 6 as subject is common for all the 8 attributes of a single record. This significantly slows down the saving of the records in the database when the dataset is too large e.g an excel sheet with 100000 rows and 50 attributes, to save this dataset to database DC run 5 million operations instead of just 100 thousand.
Error log
NA
While working on the issue, i took an approach a generating fixed_value and timed_value tables at runtime for every datasource.
Benefits:
- As explained in the issue it takes care of the issue of saving same subject with multiple attributes which increases the number of transactions
- For every datasource there would be one table for timed_values and one for fixed_values which allows java to access these tables through multiple thread
- Corrupting one datasource will not effect the other datasources.
Disadvantages:
- Keeping track of all the tables that gets generated for every datasource
- a lot of code refactoring
- managing communication between different tables at runtime.
- if the link of the datasource would change it would also change the random generated names for timed and fixed values tables that could create conflict.
Another approach that we could take is using a nosql database, which would allow the same database structure as DC currently has and would also address the above listed issues.
Because it is a schema less approach we can reduce the number the operations as one row could have 10 cols but the other could have 100.
There would be no need to generate tables at runtime.
However it would require re-writing the backend from scratch but it would be more robust approach.