BASC-Archiver icon indicating copy to clipboard operation
BASC-Archiver copied to clipboard

Generate a browsable index.html thread listing

Open DanielOaks opened this issue 10 years ago • 11 comments

One of the great ideas suggested by @antonizoon is generating an index.html file in the root or somewhere we can use to browse the various threads in our archive. That or a JSON file or something similar, but I think a decent little HTML file generated with some templates and showing a thread listing similar to the pages in a 4chan board shouldn't be too much trouble.

Would make it really nice to browse archives on our personal machines, and I can see this being a great feature.

DanielOaks avatar Dec 04 '14 11:12 DanielOaks

This is indeed an awesome idea, but the implementation isn't that simple imho, at least if the archive is somewhat big.

  • The index needs to be regenerated with every new thread?
  • There needs to be some kind of pagination?
  • How is this handled with multiple boards? One index for each board?
  • Is it even useful to have an index with lots of threads? Some kind of search would be very useful then.

ghost avatar Jan 03 '15 00:01 ghost

That's fair enough, especially with big archives.

If we were implementing something like this, I'd try to make the main 'index' a json file, stored either in the base site directory /4chan/index.json, or under the specific board /4chan/q/index.json. The actual html file would be in the same directory and just read from that json file, using Javascript to dynamically read the first post and another post or two from the thread to produce a standard chan-like thread browsing experience (eg: how these threads are displayed, as a list). The pagination would probably be all handled via that one index.html file, javascript letting us obtain query string params and all that fun stuff.

I'd try to make the 'index' file as simple as possible so we could easily append to it with each added thread, and so that we wouldn't have to edit the actual html file each time we add a new thread.

That said, that method may not play nicely with local resources not being allowed to be obtained from file:// URLs, so otherwise we would just do it some other way. We'd find the nicest way to structure it with the thread URLs and such.

I'll also look at how FoolFuuka displays things and handles thread searching, likely taking some inspiration from there when I implement this feature.

That said, this feature will likely be a while off from getting implemented, but if you do have any other ideas, please let us know!

DanielOaks avatar Jan 03 '15 12:01 DanielOaks

This is related to the utility described in #21

DanielOaks avatar Oct 01 '15 23:10 DanielOaks

So there will probably be three functions that build this thread listing:

  • Jinja2 Template - Just insert data from index.json into a template to create index.html, a viewable thread directory.
  • Regenerate - Build a new index.json that lists the OP posts of all threads that are currently archived. This only needs to be done for legacy archives, or if you've deleted one. Activated with command line argument.
    • It does this by crawling the subdirectories for the first OP Post.
    • If index.json already exists, it will merge the new data into it.
  • List Update Hook - From now on, every archival process will append the current OP post to the index.json file. It will not regenerate the list.

antonizoon avatar Oct 02 '15 00:10 antonizoon

I was thinking similarly. Regenerate the thread index from scratch if it doesn't exist, otherwise just load and modify the existing index (though the index may need to be inserted into the created thread html, because of file: URI loading issues with browser protection stuff.

On 2 October 2015 at 10:19, Lawrence Wu [email protected] wrote:

So there will probably be two functions for building this thread listing:

  • Jinja2 Template - Just insert data from index.json into a template to create index.html, a viewable thread directory.
  • Regenerate - Build a new index.json that lists the OP posts of all threads that are currently archived. This only needs to be done for legacy archives, or if you've deleted one.
    • It does this by crawling the subdirectories for the first OP Post.
    • If index.json already exists, it will merge the new data into it.
  • List Update Hook - From now on, every archival process will append the current OP post to the index.json file. It will not regenerate the list.

— Reply to this email directly or view it on GitHub https://github.com/bibanon/BASC-Archiver/issues/4#issuecomment-144882893 .

DanielOaks avatar Oct 02 '15 00:10 DanielOaks

May also need to be inserted into the created index html file, that is. We'll have it in there so the loading works on non-served versions, and the separate index.json file because our tool will use and modify and regenerate based off that.

One requirement I have for the index is that it uses no external files, just and only that HTML file. Having external files (stylesheets, js files, etc) needing to be stored somewhere else for it would be cluttered and unnecessary.

On 2 October 2015 at 10:47, Daniel Oaks [email protected] wrote:

I was thinking similarly. Regenerate the thread index from scratch if it doesn't exist, otherwise just load and modify the existing index (though the index may need to be inserted into the created thread html, because of file: URI loading issues with browser protection stuff.

On 2 October 2015 at 10:19, Lawrence Wu [email protected] wrote:

So there will probably be two functions for building this thread listing:

  • Jinja2 Template - Just insert data from index.json into a template to create index.html, a viewable thread directory.
  • Regenerate - Build a new index.json that lists the OP posts of all threads that are currently archived. This only needs to be done for legacy archives, or if you've deleted one.
    • It does this by crawling the subdirectories for the first OP Post.
    • If index.json already exists, it will merge the new data into it.
  • List Update Hook - From now on, every archival process will append the current OP post to the index.json file. It will not regenerate the list.

— Reply to this email directly or view it on GitHub https://github.com/bibanon/BASC-Archiver/issues/4#issuecomment-144882893 .

DanielOaks avatar Oct 02 '15 00:10 DanielOaks

Ok, so we can have a cut down Futaba CSS theme as a <style> block.

Because we have Python + Jinja at our fingertips, we can just insert data into the HTML file each time, and append to the JSON file for the script to read from later.

Truthfully, a feature I kind of felt was critical was Sortable tables. Even though the Sortable clause is already in HTML5, browsers still do not support it, so we have to use JavaScript to do it. I guess we will have to toss this JavaScript into the index.html file then.

http://github.hubspot.com/sortable/docs/welcome/

antonizoon avatar Oct 02 '15 02:10 antonizoon

Yep, exactly.

I'll start the CSS/templating after I've merged threaded into master and pushed out 0.9.0. We'll make it all nice and shiny and responsive.

Will probably go with Compass+SASS for the styling, whatever JS we need, Py can stitch it all together with the Jinja templates.

Never used Sortable tables, will take a look!

On 2 October 2015 at 12:24, Lawrence Wu [email protected] wrote:

Ok, so we can have a cut down Futaba CSS theme as a

Because we have Python + Jinja at our fingertips, we can just insert data into the HTML file each time, and append to the JSON file for later.

Truthfully, a feature I kind of felt was critical was Sortable tables. Even though the Sortable clause is already in HTML5, browsers still do not support it, so we have to use JavaScript to do it. I guess we will have to toss this JavaScript into the index.html file then. Maybe we could have an argument not to include it?

http://www.kryogenix.org/code/browser/sorttable/sorttable.js

— Reply to this email directly or view it on GitHub https://github.com/bibanon/BASC-Archiver/issues/4#issuecomment-144900142 .

DanielOaks avatar Oct 02 '15 02:10 DanielOaks

@antonizoon, just wondering, were you thinking sortable tables as in being able to sort the list of threads (by date, etc), something like that? I can work that out, and I'm thinking being able to search the OP / thread title, etc, similar to how 4chan's catalog lets you.

DanielOaks avatar Oct 02 '15 06:10 DanielOaks

Since I am also working on making a World4ch viewer (displaying archived databases from dis.4chan.org), how I did it there was to create a large table listing all threads, and used a sortable table to arrange threads ascending or descending by date, bump time, thread number, or number of posts. Basically, a poor man's data analytics, with Ctrl-F as the search system.

One issue with this approach, inherent to static HTML, is that there is no pagination: though that may not be something we want.

I guess an approach similar to the 4chan Catalog would fit for an Imageboard, but any kind of search or rearrangement system requires a lot of extra javascript.

antonizoon avatar Oct 02 '15 19:10 antonizoon

I don't think it'll be too bad. I'll play around with it and see what I can get cooked up!

DanielOaks avatar Oct 02 '15 21:10 DanielOaks