tern icon indicating copy to clipboard operation
tern copied to clipboard

CRUD API for the cache

Open timovandeput opened this issue 5 years ago • 7 comments
trafficstars

My understanding: Tern relies on ScanCode Toolkit to automatically detect the licenses mentioned in the source code of packages, and stores new detected licenses in a central cache for performance reasons.

From experience with ScanCode Toolkit I know that non-license text in README files and file comments can sometimes lead to the detection of licenses that do not actually apply to the package. (These situation typically leads to reporting additional licenses.)

My question is:

Does Tern provide any means of manually correcting (such additional) licenses?

timovandeput avatar Aug 25 '20 07:08 timovandeput

My understanding: Tern relies on ScanCode Toolkit to automatically detect the licenses mentioned in the source code of packages, and stores new detected licenses in a central cache for performance reasons.

The cache is there for performance reasons, but it is not meant to be a "central source of truth" for all containers. At this time, it can use a source of truth API to analyze a container image via a custom extension.

From experience with ScanCode Toolkit I know that non-license text in README files and file comments can sometimes lead to the detection of licenses that do not actually apply to the package. (These situation typically leads to reporting additional licenses.)

My question is:

Does Tern provide any means of manually correcting (such additional) licenses?

Not yet! We have an issue filed for creating a CRUD API for the local cache for spot corrections.

nishakm avatar Aug 25 '20 13:08 nishakm

@nishakm Is https://github.com/tern-tools/tern-api helpful in resolving this issue?

ashok-arora avatar Nov 08 '21 18:11 ashok-arora

@nishakm Is https://github.com/tern-tools/tern-api helpful in resolving this issue?

Not really. That is the web API. This issue is for developing CRUD operations on a cache database. So that means adding operations to add, update and remove objects similar to https://github.com/tern-tools/tern/blob/main/tern/utils/cache.py

My suggestion would be:

  1. Create a folder in tern called API
  2. Create an abstract base class with methods to implement similar to cache.py
  3. Inherit from this class to recreate the same operations in cache.py

cc @rnjudge

nishakm avatar Nov 09 '21 14:11 nishakm

@nishakm Thanks for the suggestions.

I had a couple of doubts:

  1. Will there be a new CLI argument for performing cache CRUD operations?

  2. In https://github.com/tern-tools/tern/blob/main/tern/utils/cache.py, remove_layer function takes layer_hash as input,

def remove_layer(layer_hash):
    '''Remove from cache the object referenced by the layer hash'''

whereas, add_layer function takes layer_obj as input.

def add_layer(layer_obj):
    '''Given a layer object, add it to the cache
    We use the layer's to_dict object and make a dictionary such that
    the key is the layer object's fs_hash function and the value is the
    rest of the dictionary'''

What would be the appropriate input for the different CRUD operations?

ashok-arora avatar Nov 26 '21 10:11 ashok-arora

@nishakm @rnjudge Can I use InquirerPy to make the CRUD command-line?

ashok-arora avatar Nov 30 '21 15:11 ashok-arora

@nishakm Thanks for the suggestions.

I had a couple of doubts:

1. Will there be a new CLI argument for performing cache CRUD operations?

That's a good question! Yes, I would expect to have an option in the command line that indicates what "backend" tern will be using to store data. Something like tern -s/--storage etcd. But for this issue, we just need to define and implement the API.

2. In https://github.com/tern-tools/tern/blob/main/tern/utils/cache.py, remove_layer function takes layer_hash as input,
def remove_layer(layer_hash):
    '''Remove from cache the object referenced by the layer hash'''

whereas, add_layer function takes layer_obj as input.

def add_layer(layer_obj):
    '''Given a layer object, add it to the cache
    We use the layer's to_dict object and make a dictionary such that
    the key is the layer object's fs_hash function and the value is the
    rest of the dictionary'''

What would be the appropriate input for the different CRUD operations?

The "key" in the database is the layer's hash. But in order to fill all the information collected for a layer, you need to provide the whole object. Instead of inputting the layer object, you can input the layer dictionary (the output of layer.to_dict()).

nishakm avatar Nov 30 '21 15:11 nishakm

@nishakm @rnjudge Can I use InquirerPy to make the CRUD command-line?

You don't need to do that for this issue, but it's definitely something that looks like a good addition to the project :)

nishakm avatar Nov 30 '21 15:11 nishakm