redun icon indicating copy to clipboard operation
redun copied to clipboard

please document required setup for scheduling tasks from python code

Open hamr opened this issue 2 years ago • 2 comments

The documentation helpfully provides examples of how to instantiate a Scheduler and run tasks, e.g.

scheduler = Scheduler()
result = scheduler.run(main())

However, that does not appear to take advantage of caching -- tasks run every time -- so it's not quite analogous with running something like

client = RedunClient()
client.execute(["redun", "run", "tasks.py", "main"])

I'm guessing that's because scheduler object isn't making use of the database. And I expect it's relevant that I'm seeing this message when calling scheduler.run().

INFO     redun:__init__.py:1199 Upgrading db from version -1.0 to 3.1...

How do you set up a scheduler object so that it behaves more like calling redun on the command line? Or do you recommend using RedunClient instead? And could you please add that to the docs?

hamr avatar Aug 16 '22 22:08 hamr

Hi @hamr thanks for posting this question. It does appear we haven't fully documented this case. We'll add that. In the meantime here is how you configure the Scheduler to use a persistent database (e.g. sqlite). By default, Scheduler() will use an in-memory database that will not persist the cache between executions (which is what you are seeing).

from redun import Scheduler
from redun.config import Config

scheduler = Scheduler(config=Config({
    "backend": {
        "db_uri": "sqlite:///redun.db",
    }
}))
scheduler.load()  # Auto-creates the redun.db file as needed and starts a db connection.
result = scheduler.run(main())

In our own code, where we embed a redun Scheduler inside a larger python application we instantiate Scheduler like above. RedunClient() is really only used in tests.

Let me know if that helps.

mattrasmus avatar Aug 17 '22 23:08 mattrasmus

Excellent, thank you!

hamr avatar Aug 18 '22 13:08 hamr