promnesia Usage documentation for reddit source/expected usage pattern of the project.

I've spent the last 4 hours banging my head against a wall to try and get the reddit source functioning. I've dug through layer after layer of code... and come to the conclusion that pulling reddit data is a manual process. Maybe I'm wrong about that, but there doesn't appear to be any info about how to properly provide the secrets file that rexport needs, except by command line. Both rexport and HPI use an "exports_path" variable, but as far as I can tell, that just reads existing data that was already dumped.

With this much confusion about using basic connections, am I mistaken in thinking that this tool should do some amount of automation for tracking history? Or is it expected that I manually create files of things I reviewed lately. I still suspect that I was on the right track originally due to what the browser extension did, but any deviation from auto.index, or the browser extension, seems to come up with very little info on usage. It appears that everything has this idea that there is something I should already understand about the source or something, but whatever this gap is, I can't figure it out.

I've tried including the stuff from the secrets.py file in the HPI config, as well as directly in promnesia/config.py. No errors in either, and promnesia index doesn't error, but it also comes up empty, with no data.

I guess what I'm asking is, is there solid documentation on how to setup each source, from a blank slate? And specifically, what is the expected process for setting up automatic indexing from reddit.

Mar 01 '21 02:03 Faeranne

Ok, after reviewing https://beepb00p.xyz/myinfra_files/myinfra.svg again, I'm going to assume that there is no automation for pulling down reddit or other sources, but that they have to manually be pulled down.

Mar 01 '21 02:03 Faeranne

Hey, sorry about your experience! I understand the frustration of dealing with underdocumented systems, happens all the time to me as well :( (also admire the "4 hours" part -- I'm similarly stubborn, many other people might just give up). Trying to improve the docs for my projects, but only have limited time, and also hard to have an 'outside the box' view to figure out the most difficult things for other people.

Regarding your questions:

and come to the conclusion that pulling reddit data is a manual process. Maybe I'm wrong about that, but there doesn't appear to be any info about how to properly provide the secrets file that rexport needs, except by command line.

It appears that everything has this idea that there is something I should already understand about the source or something, but whatever this gap is, I can't figure it out.

Yep, I feel like the gap is that Promnesia itself (and HPI which it uses for Reddit data) don't know anything about API tokens, they read data from exported JSON files on the disk. So the access to data is completely 'offline', and the exporting itself happens in a separate process (which itself can be manual, but usually automatic, e.g. scheduled in cron)

I tried illustrating that here

I guess what I'm asking is, is there solid documentation on how to setup each source, from a blank slate? And specifically, what is the expected process for setting up automatic indexing from reddit.

You're right, perhaps there needs to be at least one detailed guide, other sources that use HPI are similar. So to answer your question, basically what is needed:

set up rexport https://github.com/karlicoss/rexport#setting-up, preferrably so that it dumps timestamped files and puts them in export_path
setup HPI
- main package: https://github.com/karlicoss/HPI/blob/master/doc/SETUP.org#install-main-hpi-package
- configure Reddit module: add the config section with export_path https://github.com/karlicoss/HPI/blob/master/doc/MODULES.org#myreddit This guide https://github.com/karlicoss/HPI/blob/master/doc/SETUP.org#private-configuration-myconfig can give more clues. In particular, hpi doctor my.reddit can be useful to check that HPI indeed loaded up Reddit data.
setup promnesia: https://github.com/karlicoss/promnesia#setup Basically the config should just be
```
from promnesia.sources import reddit
SOURCES = [reddit]
```
This guide https://github.com/karlicoss/promnesia/blob/master/doc/TROUBLESHOOTING.org hopefully can help to identify common issues.

I've been thinking about setting up some repository/docker container/something like that which does all this wiring (even if paths/etc are hardcoded), so ideally the user only has to supply the API tokens, but haven't got to it yet. Hopefully, soon!

Thanks for trying and creating the issue, it helps to know where I should concentrate! I'll update here once the process is simplified, in case you'd still be up for trying.

Mar 01 '21 22:03 karlicoss

Wow. Thanks for the response. It's gotten kinda rare for devs to follow through on issues related to documentation. Sounds like I ended up on the money at the end there. After further reading your blog post, I understand now why it's structured this way, just left it a bit confusing. More docs always helps.

If you do update (or make a Docker image, which I'd gladly try out), I'll be glad to help see how well the documentation works. Should I re-open this issue?

Mar 01 '21 23:03 Faeranne

Yeah, let's reopen it :)

Mar 01 '21 23:03 karlicoss