tern
tern copied to clipboard
Proposal: Create a database backend with an associated API
It could be useful to have a database backend so that data can be more easily organized and queried. I think SQLite would be a good fit (at least at first) due to its ease of setup and management via the sqlite3 module in the standard library. Eventually we can add support for other databases.
@PrajwalM2212 recommended sqlite as well: I think we can just choose sqlite3 because 1. It is faster 2. It is good for applications where code that executes sql statements and the application reside on the same machine. 3. It also supports huge amount of data upto 140TB with greater performance 4. It is provided as part of python standard lib https://www.sqlite.org/whentouse.html
The main requirement is that the storage be self contained ,right? that's why redis is not an option? @nishakm
@zoek1 That was one of the reasons why I suggested sqlite. Since we are only using the cache for analysis purpose ( our internal use ) , sqlite gives the best value.
At this time, my main concern is to move away from storing data in a YAML file and into something that is queryable. The discussion I would really like to have is whether we should be using a key-value store (like Redis) or a relational database (like sqlite). One thing about choosing a relational database is that you will need to put time into designing the database. Once done, it is difficult to undo. Key-value stores are easier to change, but suffer from the same problems as the flat YAML file which is that as more data gets added, it becomes less queryable. I am personally leaning towards implementing this in sqlite because we already have a data model and making an API for queries means the database can be switched with something else.
My research shows that using a json file as a backend greatly improves performance:
yaml backend: 76 seconds json backend: 0.47 seconds
We would still like a database backend so folks can set up a centralized repository which is queryable but for now, replacing the caching format from json to yaml is an easy improvement.
- Design CRUD API for different items in the database #792
- Implement the database #863
- Implement the sync mechanism #862
What's the status of this proposal and can I work on it?
I don't know if it is possible but since we are aiming to store the container image into database, can't we convert docker image to JSON format and then store in JSON data in redis database. Since JSON greatly increase the performance and also accessing database through Redis is faster.