Mango [Feature Request] Use embedded metadata like ComicInfo.xml

[Feature Request] Use embedded metadata like ComicInfo.xml

Open schemen opened this issue 3 years ago • 20 comments

Is your feature request related to a problem? Please describe. ComicInfo.xml is in all of my chapters with relevant information. It would be great if metadata is handlet that way: Use existing systems like ComicInfo.xml. You can read and update metadata that way.

Describe the solution you'd like Mango reads embedded metadata to set titles, tags, authors etc.

Describe a small use-case for this feature request I can search Manga better by tags

May 19 '21 10:05 schemen

Hi, thanks for reaching out!

I know it's from ComicRack, but I've never used it, so I am not sure how it works. I did find some example ComicInfo.xml files online, but it would be great if there's a clear documentation of the format. Do you have any suggestions?

I can see that this feature would be helpful for some users, but it has a lower priority in my list. I am focusing on refactoring the plugin system and getting Mango to work with the MangaDex API v5. If you or anyone else are interested, feel free to submit a PR for this!

May 20 '21 23:05 hkalexling

I noticed that ComicRack has fallen and that format is not really official anymore but more of a lingering thing. I guess would be cool to implement but since it's not a good format anymore, have a low priority is fine.

Some form of embedded metadata would be cool though!

May 25 '21 07:05 schemen

I guess would be cool to implement but since it's not a good format anymore

To be honest, xml was never a good file format. It is difficult to parse and read.

I would suggest the toml file format:

# lines starting with # are comments
title = "Grand Blue (Dreaming)"
# this should be supported too:
# author = "Inoue, Kenji (Story)"
# which would be a shorthand?
authors = [
    "Inoue, Kenji (Story)",
    "Yoshioka, Kimitake (Art)", # trailing commas are okay
]
description = """
Among the seaside town of Izu's ocean waves and rays of shining sun, Iori Kitahara is just beginning his
freshman year at Izu University. As he moves into his uncle's scuba diving shop, Grand Blue, he eagerly
anticipates his dream college life, filled with beautiful girls and good friends.

But things don't exactly go according to plan.
"""

genres = [
    "Comedy",
    "Slice of Life",
    "Seinen"
]

# the date format is: YYYY-MM-DD
publish_start = 2014-04-07

which has the benefit that you can have multi-line strings, comments, and it is easy to read by humans.

There is a toml parser for crystal available: https://github.com/crystal-community/toml.cr

Aug 15 '21 15:08 Luro02

@Luro02 I guess toml, yaml and ini are all great choices for this. Also I think it's important to make writing the embedded metadata file an optional feature and have it disabled by default as some users might find it too intrusive.

Aug 16 '21 12:08 hkalexling

I don't know if you still needed documentation for ComicInfo.xml, but in the komga repository, there is a resource that lists all "officially supported" tags as well as example files. While I agree ComicInfo is a pretty awful standard that has never even really been fully documented, it's a somewhat legacy format that a not insignificant amount of people use/have used. I think supporting it is a good idea even if it in itself is not ideal. Adding support to at least read from them, if not write, is a good idea as people could at least transfer their metadata to whatever format becomes prevalent or that Mango decides on.

Sep 01 '21 00:09 shrublet

I agree wholeheartedly!

It's VERY prevalent and extended with inofficial tags a bunch, reason being that ComicRack died (RIP sweet prince, one of the kinds of softwares that taught me to select my applications a lot by availability of source code by now! ESPECIALLY since running my own rack server at home :D)

It's common to find meta data scrapers that output in that format, so... A necessary evil. Komga project only wants to support official tags from it which is.... Missing the point if you ask me, but hey.. God speed to them I guess.

Sep 13 '21 03:09 GlassedSilver

If possible, are there some ways to support to parse a custom metadata format? the downloader that I use makes their own text file for metadata (author, tags, etc.) Could it be done like download plugins doing?

Oct 05 '21 09:10 Leeingnyo

If possible, are there some ways to support to parse a custom metadata format? the downloader that I use makes their own json file for metadata (author, tags, etc.) Could it be done like download plugins doing?

Love this idea! Would really make it customizable and adaptable to all sorts of downloaders!

Oct 05 '21 11:10 GlassedSilver

Yeah great idea! We could have a special directory in the plugins folder (e.g., ~/mango/plugins/parsers) that contains a list of metadata parser written in JS and allow users to contribute, but I worry about the performance of that. We might need to parse tens of thousands of entries on scan, and this approach might be too slow with the additional overhead to bridge Crystal and JS. Also JS itself is not very performant.

Another option would be to build these parsers directly into the main app, and this would certainly be much faster, but then people would have to contribute in Crystal which is not as widely used as JS.

Oct 06 '21 01:10 hkalexling

What if digestion of this kinda stuff is done through a sample file that will contain the exact scheme?

Where the actual file would contain the title, you'd put tokens, similar to how adding custom search engines to your browser often results in www.sample.com/search.php?q=%QUERY% whereas the %QUERY% is something the application expects to find.

That'd be language-independent and you could always adapt it to any possible back-end changes without re-writing those plugins.

Because basically, the parser for that only needs to know the structure, you don't really to rewriting parsing logic every time for every kind of file, right?

Also lowers the barrier of entry for people to submit these logic samples. All you gotta do is grab one of the files your software generates, look up the token for a given value like title and then fill in that token in a file that seems easy enough to read and understand.

Oct 06 '21 04:10 GlassedSilver

@GlassedSilver Good point but I don't think an example file is accurate enough. For example you can have your example file like this

title: %TITLE%
author: %AUTHOR%

but it's still ambiguous. Do the rows have to stay in the same order? Can we add in multiple spaces after :? Can we use non-standard whitespaces? Can we have whitespaces before each line? This goes on and on.

Oct 06 '21 05:10 hkalexling

Hmmm, good point. Maybe there's an existing good library for this kind of stuff?

Also, I think a standardized output (from an application) will write predictable information. As in: there will likely not be an unpredictable amount of spaces, etc...

A different beast would be manually written info files, but I tend to believe that applications should output rather predictable files.

Maybe additionally tag the "search" terms that signal a certain part is coming.

As in:

$$<TITLE>$$%"A wonderful Manga"%$$</TITLE>$$

Something like that I guess.

Oct 06 '21 09:10 GlassedSilver

@GlassedSilver Haha the searching method you mentioned is already not far away from regex, which is the proper and well-defined way to perform text searching. It's certainly unambiguous and language-independent, but again not really novice-friendly.

Now that I think about it, I guess the performance is not a big issue. The metadata parser can slowly run as a background task. Maybe we default we can configure it to run every 24 hours, and the parsed metadata will be cached and kept between runs anyway so we don't need to re-parse existing entries. JS plugins might be a good idea.

Oct 07 '21 09:10 hkalexling

@GlassedSilver Haha the searching method you mentioned is already not far away from regex, which is the proper and well-defined way to perform text searching. It's certainly unambiguous and language-independent, but again not really novice-friendly.

Now that I think about it, I guess the performance is not a big issue. The metadata parser can slowly run as a background task. Maybe we default we can configure it to run every 24 hours, and the parsed metadata will be cached and kept between runs anyway so we don't need to re-parse existing entries. JS plugins might be a good idea.

Dear Lord, praise the day I finally feel comfortable with Regex, my old nemesis.

YOHOOO I never thought it'd be re-figuring out everything everytime anyhow. Instead I was thinking of a conversion process. As in, leave the OG file intact and then generate our own internal record from that for permanent storage.

Only re-processs already imported files if the processing module gets updated (fixing possible bogus reads through a bugfix) or the template gets an update (possibly reading more values in a later release).

Cache sounds a bit too temporary. If the imported metadata is to my liking I'd prefer for it to be as permanent as can be.

This is what I don't like about Plex... if the online source has a hick up it could theoretically bork my metadata unless I manually lock everything, which is stupid. At least here we'd not be dependent on online sources but local files, so there's that, but still.

Oct 07 '21 09:10 GlassedSilver

I think info.json can be initialized with a metadata file at the first scan only, but cached metadata would be okay. Then the priority goes like this? ex) priority of display_name: that in info.json set by user > cached metadata from local files > filename

JS plugins might be a good idea.

Good! I wrote a simple code for a plugin interface proposal:

// ComicInfo.js
function getFilename() {
  return 'ComicInfo.xml'; // find the entry matched with in zip file and parse it
}

var parser = new DOMParser(); // but duktape doesn't support this...

function parseMetadata(contents) { // '<ComicInfo><Title>Hi</Title></ComicInfo>'
  var parsed = parser.parseFromString(contents, 'text/xml');
  var title = parsed.querySelector('Title').textContent; // 'Hi'
  return JSON.stringify({
    title,
    // ... other metadata tags etc.
  });
}

and we decide to support which properties in the Mango. for now I can think of:

display name (title)
tags (#175 resolved first)

Oct 07 '21 14:10 Leeingnyo

@Leeingnyo that looks like it depends on every type of metadata file having the same names for the same data.

This will look for "Title" tags for titles, another app might call it "name" and yet other apps that may not even use very approachable tags might even go as far as calling it "tag1". You get my drift. :p

I think the info wrapper definitely has to define what is contained within it.

Oct 07 '21 18:10 GlassedSilver

@Leeingnyo Duktape won't be an issue here. We can just expose some Crystal functions to JS like what I did in https://github.com/hkalexling/Mango/blob/master/src/plugin/plugin.cr

@GlassedSilver It's infeasible for us to support every downloader out there. If you or any user encounter a ComicInfo file with an unexpected structure, you can always contribute and update the parser to look for the right tag. For example, with JS you can instruct the parser to look for the "Title" tag for the title, and if it doesn't exist, look for "name", and then "MangaTitle" and so on.

Oct 08 '21 05:10 hkalexling

@GlassedSilver It's infeasible for us to support every downloader out there.

I never said I'd expect as much. o.O

I was only talking about a user-friendly template system for the user (or by extension the community) to fill gaps where they may come across such.

And it's only an idea as well. I just don't know how my downloader might change from what I have now to something else in the future, because development for it might fork into another new contender yet again (FMD2 currently btw) or because something else, better comes along (maybe something that can run on Linux as a container).

Or how I may eventually want to add my smutty doujinshis to Mango as well if HPX ever turns out to not be the kind of software that it seems to be promising to be. (I mean I'm happy with its concept so far, but who knows)

So the ingressed parts of HPX (a media library manager and viewer, but it can also download content) and the tool I use to feed it with (another downloader, I just - at least for now - separate my originals from my doujinshi collection) may need coverage in the future as well.

Nothing is for certain.

I know that there's probably a lot of non-devs who would rather work up some brain muscle for a text template file to wrap some easy to understand terms with rather than being told to "just do something in JS to manipulate a component of the software".

First thing that comes to my mind is: something else that may break, when a template file is standardized to a given spec to wrap things with, etc... idk okay... Maybe the peace of mind is something that sets in if you're close to the code of the project, but to the average user it's a bit of a leap of faith and a pretty big degree of uncertainty.

That's all I'm saying. :)

Oct 12 '21 05:10 GlassedSilver

@hkalexling if you ever want to support this, note that we have brought ComicInfo under the Anansi Project umbrella, where we steer small non-breaking evolutions of the schema : https://github.com/anansi-project/comicinfo

If you do want to support comicinfo.xml we would be happy to have you as a supporter of the initiative, and participate in discussions about the schema itself.

We also have documentation and accepted usage of the various fields : https://anansi-project.github.io/docs/comicinfo/documentation

Jan 21 '22 06:01 gotson

@gotson Awesome, thanks for letting me know!

Supporting ComicInfo is not a priority for me atm, but I can see the value of this, and I will be happy to be a supporter and contribute in discussions :D

Jan 21 '22 10:01 hkalexling

Mango Mango copied to clipboard

[Feature Request] Use embedded metadata like ComicInfo.xml

Mango
Mango copied to clipboard