astrocats
astrocats copied to clipboard
Convert import and cleanup tasks to their own classes; separate behavior from entry and catalog
We have a Task
object defined in astrocats/catalog/task.py
which represents a to-do-task. These to-do-tasks include a function name (and the submodule they're located in) which should be executed to complete the given task. Instead, the Task
object should be expanded to include the function itself. Likely, this should be something like:
class Task:
def __init__(self, ...):
# setup all of the variables currently in the `Task` objects
def load(self, ...):
# The actual function to complete the task
A list of Task
objects will be created, and each will just have a few parameters (like they do now). If a Task
is active (i.e. Task.active == True
) then Task.load()
will be called in the import script.
Benefits:
- Make it easy to subclass the general-catalog tasks as needed by individual catalogs.
- Remove the need for a
tasks.json
input file. Instead all of the tasks can live in thetasks
directory, which will be searched. The default settings will be stored in the class definitions.
Continuing on the previous train of thought, Task
s (and really the whole import/processing) process might be better as its own class, instead of mixed in with the Catalog
base class, to clean things up and keep them better organized. The import class would always be given the catalog, of course, and thus access to any required attributes/functions. Likely, the same should happen with cleanup and sanitization: instead of just being a task used during import, this might be better as its own class, which one method of triggering its use, is via one of the import tasks. Cleaning could also be merged with exporting/saving.
For debugging and data improvements, it would really help if both directions (import and cleaning/export) had a particular function that was run on each event, in addition to each task. That way it would be easier to target particular events. e.g.
Preserve a Task
as simply a record of the task to be done, i.e. a simple wrapper for some json-data (including module, activity, etc). For each task, subclass a new Importer
class.
class Importer:
def __init__(self, catalog, ...):
# setup all of the variables currently in the `Task` objects
def import_task(self, ...):
# Load files, source information, etc; setup progress bar
# ...
for entry in entry_list:
self.import_entry(entry, ...)
def import_entry(self, ...):
# Load/parse each entry, add to catalog
# ...
Similar structure could be used for cleaning/exporting with an def clean_task()
function and separate def clean_entry()
function.