stork icon indicating copy to clipboard operation
stork copied to clipboard

Include HTML meta "keywords" tag in search index

Open korshavn opened this issue 5 years ago • 4 comments

Hello James! Your work is impressive! - Stork is really great already, I love it and hope you will expand it and possibly get more people like yourself developing and maintaining it. That last point (of being a one-man side project) is the main reason my employer is reluctant to make the company use Stork. Otherwise we love the architecture and how it all works.

Just adding this feature request (and a couple of more on other PRs). Hope to see the feature in the roadmap.

HTML files have some meta data that is worth including in the index, specifically the html->head->meta name="keywords" tag.

korshavn avatar Dec 21 '20 08:12 korshavn

Thanks for writing in, @korshavn! To clarify: when Stork is indexing HTML files, this suggestion is requesting that <meta name="keywords" /> terms are indexed alongside a file?

That sounds like a good idea! I'll add it to my feature list :)

James

jameslittle230 avatar Dec 22 '20 01:12 jameslittle230

What about other metadata such as microformats, schema.org, etc? (Suddendly Stork turned into portable Google :P)

fauno avatar Dec 29 '20 13:12 fauno

@jameslittle230 It'd be good to support document metadata more generally, in the form of hash map (Map from string to string), for instance.

That would allow me to use stork in https://neuron.zettel.page/ - for search by tag or any other markdown frontmatter meta.

This would naturally extend to supporting typed queries like foo (as stork supports currently) and foo tag:mytag (same search, but limited by tag) and tag:mytag (return all documents with this tag).

srid avatar Mar 30 '21 17:03 srid

@srid - Yeah! This issue is more about indexing different types of content parsed from HTML, but the idea of attaching metadata to specific entries is captured fairly well in #90.

The infrastructure is there to attach metadata to documents and even words within that document -- that's how Stork's SRT handling can link you right to the specific timestamp of the thing you search for. It's currently not exposed anywhere user-accessible, though, just to keep scope down initially.

If you have a proposed API for how you'd want to include metadata in the search index and read it in Javascript, I'm all ears, but maybe as a comment in issue #90 to keep discussion about the same topic together. :)

jameslittle230 avatar Mar 31 '21 01:03 jameslittle230