HPI Discussion: configuring modules

Discussion: configuring modules

Open karlicoss opened this issue 4 years ago • 9 comments

Discussion around the design decisions here

May 10 '20 08:05 karlicoss

Just some thoughts on configuring/modifying this.

This is heavily biased as someone who is familiar with python, and I expected that I'd have to fork/modify HPI to fit my needs.

Edit: Though, not sure if forking/modifying is the common case, so take this with a grain of salt. Is just my though process on using this so far.

In general, I'm happy with configuration/tooling that this has provided around a layer to load data. Splitting up the configuration into ~/.config/my and using dataclasses to validate attributes makes sense once I've seen it a few times.

Though, when configuration is programmatic (python), and theres a lack of documentation/dataclass (e.g. my.coding.commits), its unclear what exactly to do. I was able to figure out what fields were necessary by inference, but in a larger namespace/module that might not be obvious.

(sidenote; I believe its this, feel free to correct me if I'm wrong)

class commits:
    names: List[str] = ["Sean Breckenridge"]
    emails: List[str] = ["[email protected]", "email two...."]
    roots: Paths = [Path(environ["REPOS"])];

I'd think that everyone has different modules/needs. Not everyone is going to use the same hardware, social media, or systems for note taking, so if your objective is to have programmatic access to all of your data, I don't think pip install hpi would work in most cases. Either pip install -e . or with_my seem like fine solutions to someone who wants to heavily configure/modify this.

I feel like using HPI is closer to the process of curating a vim/emacs/dotfiles configuration. This provides a useful starting point/core for handling data, and the exporters (rexport/ghexport) can be plugged in for commonly used services. Then (optionally), you write your own custom modules for your own data sources. Its slightly more annoying in this case, because of all the parsing/normalization required for ever-changing APIs/Data-Sources, but when/if exporters/DALs are somewhat standardized/pushed to pypi, I think a configuration file/object in ~/.config/my that specifies which modules are active/inactive - I'm a fan of how doom emacs does this - could be useful. Similar to doom/spacemacs, each module/namespace package could (optionally) have a doctor.py which would do checks (run tests or check if packages (like cachew/gitpython/pytz) are installed), model.py for the NamedTuple representation, and/or __init__.py/<modulename>.py, which would be the python module one would import from the REPL to get access to the data.

I'm not saying this should be extended to the extent that spacemacs has been, just that splitting it into namespace packages with an option to enable/disable 'modules' with common files makes it easier for someone to understand/edit, and it provides more structure than the current solution (I'd prefer to not mess with core on my fork whenever possible).

Commenting on the other thread, I'm sort of weary of the idea that users who are less familiar with python could place custom python files as modules in ~/.config/my/plugins. For a newer user, it would also make possible code reuse across namespaces more complicated. Personally, I prefer the idea of cloning/editing this, and then using with_my/pip install -e . to allow import my... in a REPL/python script, but its possible it could work either way.

As a side note, the random TODO and general thought-process comments have been helpful in modifying/creating modules of my own.

Aug 27 '20 06:08 seanbreckenridge

Hey, thanks for taking time into digging into the code, really appreciated!

Though, not sure if forking/modifying is the common case

It's hard to tell what's a common case, the whole thing is pretty experimental! Would be great to get the "APIs" right so they can suit everyone, but I always keep in mind that you can't really suit anyone, so forking and modifying is definitely one of the usecases I'd like to work for people!

I feel like using HPI is closer to the process of curating a vim/emacs/dotfiles configuration.

Yes!

Though, when configuration is programmatic (python), and theres a lack of documentation/dataclass (e.g. my.coding.commits), its unclear what exactly to do. I was able to figure out what fields were necessary by inference, but in a larger namespace/module that might not be obvious.

Yep, definitely agree! For an example that's hopefully a bit easier to follow and similar to what you suggest, check out Polar module, for example: https://github.com/karlicoss/HPI/blob/master/my/reading/polar.py#L24-L30 , and the corresponding bit of documentation, autoextracted from the code.

I'm writing a bit more about the approach I'm currently taking here, (after the "My conclusion was using a combined approach" line). I should add there an example with default attributes, perhaps.

Similar to doom/spacemacs, each module/namespace package could (optionally) have a doctor.py which would do checks (run tests or check if packages (like cachew/gitpython/pytz) are installed).

Yep, good idea! Currently hpi doctor searches for stats() function in the module, and cals it, e.g. https://github.com/karlicoss/HPI/blob/07dd61ca6ae2b6de20d6954ca1584accede8b762/my/bluemaestro/init.py#L102-L104 So in a sense this does end-to-end testing of the module, along with giving a short summary of your data.

~ ✔ ❯ hpi doctor my.bluemaestro --verbose
✅ config file: /home/karlicos/.config/my/my/config/__init__.py
✅ mypy check: success
   Success: no issues found in 4 source files
✅ OK  : my.bluemaestro
✅     - stats: {'measurements': {'count': 1151314, 'last': datetime.datetime(2020, 8, 26, 10, 45, 6, 988000)}}

~ ✔ ❯ hpi doctor my.emfit --verbose
✅ config file: /home/karlicos/.config/my/my/config/__init__.py
✅ mypy check: success
   Success: no issues found in 4 source files
✅ OK  : my.emfit
✅     - stats: {'pre_dataframe': {'count': 691, 'errors': 9}}

Another nice thing about this is that it these stats can be used as a part of script that regularly checks that new data gets pulled in, similar to "metrics" idea described here.

While this will obviously error if some optional PIP package is missing, definitely agree that ideally these checks should be a bit more user friendly. Also would be nice to automatically intergrate it with the setup script.

Ideally I think I'd like to make it at least possible to keep everything in a single file (to make writing new modules as easy as possible). But these are fine details, and it could be a combined approach, e.g. if doctor.py doesn't exist, search for class Doctor in __init__.py, or something.

enabling/disabling modules

Hm, can you elaborate on 'enabling' modules? In Emacs this makes sense because many modules are on hooks and depend on the file mode, whereas in HPI, modules are imported explicitly, so if you don't import it, you won't even know about the module!

I guess personally I can see couple of points where explicit enabling/disabled could be useful:

hpi doctor: perhaps useful to restrict modules when it's run without any arguments to avoid visual spam from the modules one is not using
I was thinking of maybe providing an automatic interface where one could explore all their data in a browser (based on NamedTuples/pandas DataFrame, etc). Also could be useful to only import stuff that you want.

And it provides more structure than the current solution (I'd prefer to not mess with core on my fork whenever possible)

Ah, definitely agree! The thing in util.py is hardly a solution, more of a temporary hack for hpi doctor CLI till we figure out something better. It only ended up in the core because the CLI is in the core :)

As a side note, the random TODO and general thought-process comments have been helpful in modifying/creating modules of my own.

Thanks! Glad that my habit of random todos is useful not just for me.

Aug 30 '20 19:08 karlicoss

Hm, can you elaborate on 'enabling' modules? In Emacs this makes sense because many modules are on hooks and depend on the file mode, whereas in HPI, modules are imported explicitly, so if you don't import it, you won't even know about the module!

May have been naive optimism on my part, dynamic code loading like this isn't something I've ever dealt with before.

To elaborate on my thought process a bit when initially modifying this, on master (your repo), there are lots of things that I'm not using. I don't use twitter, or blemaestro, or lastfm, and I have my own modules that I'd replace that with.

So, for example, when I'm modifying my.body.weight, (I don't use emacs as much, though org mode would probably be the reason I would), instead of using org mode, I wrote a little TUI to read/write from a CSV file instead (unrelated, but ended up creating an entire package to create those TUIs for me, so I can manually log lots of other things, like food, weight, water etc., since I don't have org mode tables/tags). But, in your my.body.weight, it imports from ..notes import orgmode, so this means that modules are loading in functionality from across modules (that aren't core), and they're not always independent.

Modules not being independent is fine, but them not being independent is what led me to decide to fork this and modify it by removing/adding my own modules, instead of trying to figure out some way to add to ~/.config/my/plugins as described here.

Perhaps modules using other modules should be should be put behind a block like:

try:
	from ..notes import orgmode
except ImportError as e:
    # so that this appears in the `hpi doctor` output in addition to the `my.config` import error.
    warnings.warn("Could not import orgmode from notes, perhaps you've modified that?")

# func below could check if `orgmode` `NameErrors` before calling `stats` (if a `stats` function existed in `my.body.weight`)

Its not really that bad technically, because hpi doctor gives me the --verbose flag, and I can go ad add/edit those imports if I want, but that wasn't obvious to me to begin with. I think it'd just be nice to have from the perspective of someone modifying this, to better understanding how the module system works/doctor works.

Currently, a bunch of the doctor calls fail:

❗ FAIL: my.pdfs                        loading failed; pass --verbose to print more information
❗ FAIL: my.photos                      loading failed; pass --verbose to print more information
...
✅ OK  : my.reddit
✅     - stats: {'saved': {'count': 50, 'last': datetime.datetime(2015, 7, 30, 5, 27, 54, tzinfo=datetime.timezone.utc)}, 'comments': {'count': '100+'}, 'submissions': {'count': 39}, 'upvoted': {'count': '100+'}}
❗ FAIL: my.rss.all                     loading failed; pass --verbose to print more information
❗ FAIL: my.rss.feedbin                 loading failed; pass --verbose to print more information

(presumably because I don't have the blocks in my.config)

(Yeah)

If you only have few modules set up, lots of them will error for you, which is expected, so check the ones you expect to work.

I'm just leaving what my thought process was here, not sure if the catch ImportError blocks are totally needed? Since, HPI in general has to be personally customized anyway. Its not strange that modules aren't completely independent, and perhaps that's fine.

For an example that's hopefully a bit easier to follow and similar to what you suggest, check out Polar module

Yeah, makes sense to me.

I'll take a look at polar and perhaps add it back, I still don't have a great solution for reading/documents/pdfs.

But, when I was initially trying to parse through the modules, I wasn't sure if polar is something that I'd want to use or not, but since I wasn't sure all these different modules were interconnected somehow like my.notes was to my.body.weight, I deleted some of the ones I wouldn't end up using.

Ah, definitely agree! The thing in util.py is hardly a solution, more of a temporary hack for hpi doctor CLI till we figure out something better. It only ended up in the core because the CLI is in the core :)

Ah, yeah. I think this was the major misunderstanding for me.

It seems that it already picks up the modules I've defined, the exclude list in util.py made me think it wouldn't.

>>> from my.core import util
>>> util.get_modules()
['my.body', 'my.body.weight', 'my.body.weight_prompt', 'my.browsing', 'my.coding.commits', 'my.demo', 'my.github.all', 'my.github.common', 'my.github.gdpr', 'my.github.ghexport', 'my.google.takeout.html', 'my.google.takeout.paths', 'my.media.movies', 'my.media.youtube', 'my.pdfs', 'my.photos', 'my.reading.polar', 'my.reddit', 'my.rss.all', 'my.rss.feedbin', 'my.rss.feedly', 'my.smscalls', 'my.stackexchange', 'my.todotxt', 'my.zsh']

I didn't realize that was already what doctor did, searching for the stats function and directly calling it, makes more sense now. I guess what I wanted wouldn't be that hard to implement.

perhaps useful to restrict modules when it's run without any arguments to avoid visual spam from the modules one is not using

This is mostly what I meant by enabling/disabling. One would still be able to import my.module, even if module is 'ignored'. Disabling a module would mean it doesnt show up in hpi modules or hpi doctor, and perhaps for any future dashboard/browser-like projects that you describe (I eventually plan to do something similar)

Or perhaps hpi modules shows:

$ hpi modules
- my.bluemaestro [disabled] # with terminal color red?
- my.body
- my.body.weight

Perhaps util.py could optionally import a user_config from my.config, that looks something like:

class modules:
    ignored = [
		"bluemaestro",
       "polar"
    ]

class github:
    ....

and those chould be added to the values ignored returns, which influences hpi doctor/hpi modules

The reference to doom emacs was also partly because it provides me with a list of modules when I'm starting out.

Perhaps hpi modules could be extended or something else added to generate something like the output of this?

$ hpi modules | sed -e 's/my\.//' -e 's/^- / # "/' -e 's/$/",/' | awk 'BEGIN {print "class modules:\n ignored = ["} {print $0} END {print "]"}'

class modules:
  ignored = [
    # "body",
    # "body.weight",
    # "body.weight_prompt",
    # "browsing",
    # "coding.commits",
    # "demo",
....

(or just generate a list and put it in the doc folder somewhere?)

That way I have a hook into core.util, even if it is a hack for now. That would mean I could disable polar, bluemaestro from showing up in doctor and anything else I wouldn't be using for a while, without explicitly deleting the file.

Of lesser importance, but may also mean that when I'm merging from upstream on my fork, might have less patch issues/merge conflicts, since I havent removed those files.

I know I could just edit the list in excluded function myself, but I think it'd be nicer if it was configurable without editing core.

at least possible to keep everything in a single file (to make writing new modules as easy as possible).

Yeah, I think this is fine as well. agree that keeping everything in a single file is nice when starting modules.

e.g. if doctor.py doesn't exist, search for class Doctor in __init__.py, or something.

Yeah, sounds good.

I think how to modify doctor (and perhaps the stats function/Doctor class) could be described in more detail somewhere here or here? I think thats what led to most of my confusion.

https://brokensandals.net/technical/backup-tooling/making-backup-validation-easier/

Yeah, I can see how that'd be a nice way of providing a snapshot of the amount of data by using doctor. Will probably do the same for my modules now that I know it works like that.

Has cleared up a couple of my issues, thanks.

Aug 31 '20 01:08 seanbreckenridge

So, for example, when I'm modifying my.body.weight

Yeah, agree it would be a problem for modules like that! Just like you mentioned for food, weight, water -- people use gazillion of different formats/methods for logging it, and it's unlikely it can be handled it in a generic way (even I used many different methods for that throughout my life!). Perhaps this is where people would have to write their own modules to handle this, unless they are using some widespread app (e.g. myfitnesspal or something).

I guess I only shared my org-mode my.body.weight module as a demonstration. I think later, I'll move it away completely to my 'personal' overlay, so there is less confusion!

Ideally I think, my.body.weight would only contain a 'common' interface with a "reasonable", minimalist schema. For example, with the ft: datetime and weight: float fields. Tools that consume HPI weight data do it through this common interface, as an example of such tools, I'm working on sharing a quantified self dashboard I'm using (some screnshots here ). Then, whoever wants to connect their weight data sources with the dashboard, would need to write a small adapter to this schema, and that's it. It would also allow use multiple data sources, and combine/merge them with just few lines of code.

To be more specific, I tried this with some existing modules:

https://github.com/karlicoss/HPI/blob/master/my/rss/all.py
https://github.com/karlicoss/HPI/blob/master/my/github/all.py
https://github.com/karlicoss/HPI/blob/master/my/twitter/all.py
also some docs on this approach https://github.com/karlicoss/HPI/blob/master/doc/SETUP.org#twitter

This is pretty experimental, there are some issues with this approach, so very open to suggestions if you have some thoughts on this!

I wrote a little TUI to read/write from a CSV file instead (unrelated, but ended up creating an entire package to create those TUIs for me, so I can manually log lots of other things, like food, weight, water etc., since I don't have org mode tables/tags)

Nice 👍 I landed on org-capture (which can quickly record an org-mode table row), but I like the automatic namedtuple conversions. I had some ideas regarding type safety for org-capture, but so far there are only few such datasources for me, so I put it on hold.

Since, HPI in general has to be personally customized anyway. Its not strange that modules aren't completely independent, and perhaps that's fine.

Yep, as I said above, this particular modules (my.body.weight) is pretty specific to my usecase, so that's how it ended up with a dependency. I think it's possible for modules to have meaningful dependencies that would be useful for everyone, e.g. if it's something to do with the filesystem perhaps? In a sense, my.core is one such common dependency. But yeah, mostly I'd expect such complicated modules to be mostly private/personal. The most common usecase I imagine is similar to my.notes -- someone could have Google Keep, or Nomie, or something else instead as their datapoints storage, and having a common module that extracts data helps.

This is mostly what I meant by enabling/disabling Perhaps hpi modules could be extended or something else added to generate something like the output of this?

Agreed, makes sense. I think there could be both enabled and disabled config sections; so depending on which one you specify it either only enables modules you listed explicitly, or enables everything and disables only the ones you listed. I guess disabled/ignored is mostly useful to me at the moment rather than other people, but hopefully doesn't hurt to have :)

Of lesser importance, but may also mean that when I'm merging from upstream on my fork, might have less patch issues/merge conflicts, since I havent removed those files.

Yes! Definitely wouldn't want people to remove files just because they get in the way.

think how to modify doctor (and perhaps the stats function/Doctor class) could be described in more detail somewhere here or here?

Yep! stats is somewhat recent, so I was reluctant to document it until it's "stable", but agree that ultimately it's useful to add to the docs.

P.S. Sorry for lag in replying, just got back from holiday and things piled up. Normally I reply quicker!

Sep 02 '20 20:09 karlicoss

Some initial work on enabled/disabled modules, would be happy for feedback! https://github.com/karlicoss/HPI/pull/85

Sep 28 '20 23:09 karlicoss

After some fiddling to get it to work on my fork, Works great!

Sep 30 '20 20:09 seanbreckenridge

relevant issue: https://github.com/karlicoss/HPI/pull/211 importing a module when you don't have a config section results in a somewhat obscure import error, wonder if we can improve it

Feb 09 '22 23:02 karlicoss

Yeah... may be possible to inspect the error to see if its importing from my.config?

can see if thats easy to do

Feb 09 '22 23:02 seanbreckenridge

Yeah, ideally would be nice to achieve lazy configs with some magic, so attributes are evaluated on the first use But perhaps parsing the exception is nice to start with

Feb 09 '22 23:02 karlicoss

HPI HPI copied to clipboard

Discussion: configuring modules

HPI
HPI copied to clipboard