twtxt icon indicating copy to clipboard operation
twtxt copied to clipboard

How to store metadata about a feed

Open Benaiah opened this issue 8 years ago • 65 comments

A number of different issues and ideas have made clear the need for a place to specify metadata about a twtxt.txt feed. For instance, essentially every idea for notifications so far needs to know where the notifications should go (technical details vary based on the proposal). The question then is how to store metadata.

Discussion in #22 has suggested a general comment character, thus allowing clients to handle individually how the metadata would be stored. I suggest building on this, allowing for general comments, but make the following format specifically for metadata:

# this is a regular comment

# the next line is a metadata entry
# nick = benaiah

This echoes the .ini format of the twtxt config file, which I think gives it a nice consistency.

The other main suggestion for metadata is to have another file. I dislike this approach because it complicates the protocol, significantly increases how much twtxt has to hit the network, and requires either a second URL for each person (for the metadata file), switching twtxt.txt to hold metadata and having another file hold the feed, or putting a metadata entry in twtxt.txt that points to the metadata file.

Benaiah avatar Feb 09 '16 22:02 Benaiah

pros of second file:

  • can cache it (if-modified-since, etag).
  • a single file must be fully loaded/parsed to find metadata. huge file could make that painful. easy to only care about the very top or very bottom of a twtxt file.

One file has the advantage of showing metadata that changes- for instance "added new profile pic on date" or "followed @<x X> on date", if we use syntax that is similar to the non-commented version:

# [date] \t key = value

tedder avatar Feb 09 '16 22:02 tedder

I like the idea of doing this in comments at the top of the file. I think the advantages of having everything in the same file outweighs any added complexity when swapping out the files if they get too big or whatever.

However, I think we would quickly hit limitations with simple key value system - how would you easily store a list of follows with this for example?

A good format could be yaml, I think. Its human readable and writable, and widely supported - we would just need to strip out the comment character at the start of each line before parsing it.

I imagine the header for twtxt would then look something like this:

# the three dashes indicate the start of the data block, so we know where
# to start converting to yaml
# ---
# username: reednj
# following: 
#  - buckket http://buckket.org/twtxt.txt
#  - xena https://xena.greedo.xeserv.us/files/xena.txt
#  - whatever http://whatever.com/twtxt.txt

Edit: somehow forgot to add the urls to the user list...

reednj avatar Feb 10 '16 13:02 reednj

@tedder consider that if you use a separate file for metadata and it also supports including messages you quickly obsolete the twtxt format as the syndication format of choice. Every client will just use the format that provides more data. Thus, the original twtxt format would be mainly useful as input (like Markdown or ReStructured Text) for scripts generating feeds.

erlehmann avatar Feb 10 '16 17:02 erlehmann

@erlehmann I said nothing about including messages in a second file.

@reednj I like the idea of metadata at the top, instead of happening anywhere in twtxt. I (personally) like yml, it's extensible in cases like this.

tedder avatar Feb 10 '16 17:02 tedder

@tedder to demonstrate: Yeah, you do not have to include messages. But any format that is powerful enough to include the metadata can be utilized for that and then you are back at using a single file. I have written a small shell script that converts a twtxt feed to the format described in RFC 4287, which describes how to convey author name/email, contributor name/email, the time of publication and the last update for a document. Since RFC 4287 also describes how to include messages, I just included them!

Here is the input file: http://daten.dieweltistgarnichtso.net/tmp/docs/twtxt.txt Here is the output file: http://daten.dieweltistgarnichtso.net/tmp/docs/twtxt.xml

erlehmann avatar Feb 10 '16 17:02 erlehmann

@reednj RFC 5005 describes a mechanism to link together several physical documents that form one logical document. It is not that hard it seems, as long as the first document contains the metadata about the aggregate.

erlehmann avatar Feb 10 '16 17:02 erlehmann

@reednj I see a problem with your example as it does not give URLs in the source, only nicknames. In reality, you would need the URL.

erlehmann avatar Feb 10 '16 17:02 erlehmann

@reednj I am not familiar with yaml. How can you do namespaces in yaml? As far as I see, you would need namespacing for forwards compatibility.

erlehmann avatar Feb 10 '16 17:02 erlehmann

So sounds like commented YAML could be the way to go? I wonder if @buckket has an opinion?

Also, please no namespaces, that is the very definition of YAGNI

reednj avatar Feb 11 '16 00:02 reednj

reednj could you explain how a format can be extensible if you do not have namespaces without basically ignoring everything in the file that is not in the default namespace? Or is the metadata format you envision a fixed format without any additional semantics, ever?

erlehmann avatar Feb 11 '16 20:02 erlehmann

Personally I would love to see twtxt either commit to a truly minimalist “no metadata” stance, or simply use Atom as the default format in a single file. Atom has everything you need. It is not the most terse file format; the existing twtxt format is the most terse if that’s what you’re shooting for. But as soon as we start trying to approximate feature-parity with Twitter, it’s likely we’ll just end up reinventing Atom/RSS poorly. Atom is human-readable, it’s a truly well-made and well-defined standard, there’s widespread support for it.

otherjoel avatar Feb 12 '16 03:02 otherjoel

You can have meta data about the user at the top of the file, without having any meta data about the messages, which is basically what I'm pushing for.

I don't think we can or should or need to compete with twitter. The appeal of twtxt is its simplicity, and xml is the opposite of that in every way.

reednj avatar Feb 12 '16 04:02 reednj

I second @reednj.

twtxt is a decentralised, minimalist microblogging service for hackers.

The minimalist part here needs to stay. The fact that we can use only one (or two soon?) lines for each tweets make it simple and clear to use.

mkody avatar Feb 12 '16 08:02 mkody

You can have meta data about the user at the top of the file, without having any meta data about the messages, which is basically what I'm pushing for.

I agree - we need user data for any sort of network propagation, but the messages themselves should remain as ephemeral and simple as they are currently. I think you hit the nail on the head.

Benaiah avatar Feb 12 '16 08:02 Benaiah

@mkody as I said, twtxt can be an input format for an already existing representation, like Markdown. Try http://news.dieweltistgarnichtso.net/bin/twtxt2atom out and you might see what I am proposing.

@Benaiah what is “network propagation” ?

erlehmann avatar Feb 12 '16 21:02 erlehmann

the messages themselves should remain as ephemeral and simple as they are currently

So to be clear, official support for things like replies to chain messages together in conversations are absolutely off the table? If so, then that feels consistent and I can dig it.

otherjoel avatar Feb 12 '16 21:02 otherjoel

@erlehmann So you mean that we could keep the twtxt file and make an atom feed from it? For the atom to have some sort of metadata, it means that our input (the twtxt file) should have them somewhere too. That feels redundant to use two files for the same purpose. And convert the file every time.

mkody avatar Feb 12 '16 22:02 mkody

I like the way @reednj posted!

Advantages:

  • people can add comments without thinking about metadata at all
  • the --- indicates yaml data to occur (thats very common)
  • having one file with also "following" etc resolves the issue of syncing following list

I really like atom and especially atom sync protocol, but twtxts simplicity and posting to your feed as simple as TIMESTAMP\tmessage is what makes it a very nice format to host on whatever webspace and post it with whatever client you have.

Everything we add with # like I suggested in #22 is an extra and should not be mandatory. Even though having yaml in twtxt like @reednj posted, could make the config file nearly unecessary ;).

DracoBlue avatar Feb 13 '16 13:02 DracoBlue

After thinking about this topic for a few days, I'm sure benaiah's first suggestion would be a very good fit for twtxt. If we just use comments like

# follow david http://example.org/david.txt
# unfollow http://example.org/user.txt
# nick mdom
# twturl http://example.org/user.txt 

somewhere in the file, it would be very easy even for the most simple client to read and write metadata in the feed. Whereas with things like yaml or ini you couldn't just read the file line by line and you probably need a parser to do the work. And this format would also allow the record who you once followed or your old twturl if somebody needs that. And for the argument about needing to parse the whole twtfile just to get the metadata: We currently are parsing the complete file every time to build the timeline so i'm not sure if this is even an issue.

I have the strong feeling we should just use the easiest and most minimal solution one can think of. I mean, that's what twtxt is all about, right? :)

mdom avatar Mar 06 '16 12:03 mdom

mdom's suggestion sounds very reasonable. I also like the log style approach therein.

archusr avatar Mar 06 '16 12:03 archusr

We talked a little about it on irc, and we would also propose to add a timestamp to the comment, so the client can reorder metadata as it seems fit. Some would leave it interspersed in the file and others could move metadata to the top of the file.

mdom avatar Mar 06 '16 20:03 mdom

to still allow for simple sorting by timestamps, irc style commands could be an alternative to # comments:

# 2016-03-06T23:23:23Z  follow user https://example.org/user/twtxt.txt
2016-03-06T23:23:23Z    /follow user https://example.org/user/twtxt.txt

archusr avatar Mar 06 '16 21:03 archusr

to still allow for simple sorting by timestamps, irc style commands could be an alternative to # comments

Then tweets cannot start with a '/' (0x2F) character anymore. I don't think it's that much of a bother compared to what metadata storage can do, and I assume it's easier to parse than having to determine that the first character is a '#' and parse date and metadata altogether. He you can just parse things naturally using the existing methods, and if the first character of the message is a '/', then store that lline as metadata, not a tweet. I was wondering when I started thinking of storing metadata : where you we store them once they're downloaded? Of course I thought of the Cache, but it isn't very generic, it was designed to store tweets, and adding metadata managing in it requires some twisting of its current methods...

Lymkwi avatar Mar 07 '16 19:03 Lymkwi

Though i still prefer the lines starting with comments, this would be also a fine choice. It's a good point that you wouldn't have to add special syntax. But i wonder how often users want to start tweets with /me or path names and then you need some kind of escaping mechanism... :/

mdom avatar Mar 07 '16 20:03 mdom

If this is the approach it would be better to use some uncommon unicode character (e.g. or ) instead of a slash.

otherjoel avatar Mar 07 '16 21:03 otherjoel

Maybe a vertical tab would work :P

On Mon, Mar 7, 2016 at 1:57 PM -0800, "Joel Dueck" <[email protected]mailto:[email protected]> wrote:

If this is the approach it would be better to use some uncommon unicode character (e.g. ? or ?http://www.fileformat.info/info/unicode/char/261e/index.htm) instead of a slash.

Reply to this email directly or view it on GitHubhttps://github.com/buckket/twtxt/issues/48#issuecomment-193473667.

Benaiah avatar Mar 07 '16 21:03 Benaiah

Maybe we can use C99 oneline comment syntax. Using // would be visible distinctive, shouldn't be that common in normal tweets and it feels like a rather nice fit for a service for hackers.

mdom avatar Mar 08 '16 08:03 mdom

One could also use a twtxt tweet (but autogenerated):

/me is following @<dracoblue https://dracoblue.nez/twtxt.txt>

and parse this on client side.

But for general meta, like the preffered nickname, real meta data without a timestamp would be more useful.

DracoBlue avatar Mar 13 '16 11:03 DracoBlue

After rereading the entire issue:

I think:

TIMESTAMP\t/ACTION parameters

where ACTION is something like "follow", "unfollow" or whatever, is the best way. And it is up to the creator of the twtxt to keep the "important" meta data (like nick) within the file, if older tweets are removed.

The nice thing about this is: clients can implement /follow dracoblue https://dracoblue.net/twtxt.txt as normal command and can format it when it gets printed (e.g. "is following @dracoblue" or "changed nickname to @dracoblue").

And the best: it is 100% backwards compatible.

DracoBlue avatar Mar 13 '16 11:03 DracoBlue

thanks for picking that up, these three variants are equally appealing to me:

timestamp     /action parameters
timestamp     // action parameters
timestamp     # action parameters

archusr avatar Mar 13 '16 11:03 archusr