xananews icon indicating copy to clipboard operation
xananews copied to clipboard

request for clues on the message storage format

Open amlynnworth opened this issue 6 years ago • 7 comments

This is a documentation request. I would be very interested to find out what the message storage format is. Looking at some files on my disk from an old copy of XanaNews, I see *.dat files, and inside that I see reasonably human readable content with some binary separators like 0F 00 after the message number and before the path.

Does anyone have a write-up on these details, for the current XanaNews?

amlynnworth avatar Jun 08 '18 06:06 amlynnworth

I'll see if I can put some documentation together for you. It would be good to having such documentation inside the repository anyway.

graemeg avatar Jun 24 '18 21:06 graemeg

Hi - definitely still interested. I'm looking at compiling XanaNews with Delphi 10.2.3 and then 10.3 now - January.

Even general hints about what code to study in order to find details about the data format would be very much appreciated.

Thank you.

amlynnworth avatar Jan 03 '19 21:01 amlynnworth

Wow - I'm amazed that people are still actively looking at XanaNews!

Here's the format of messages.dat...

The format is:
'X-Msg:' 6 char header
xxxxxxxx 8 char hex string containg message length
nn word length of first extra header
char (nn) nn char first extra header string
nn word length of second extra header string
char (nn) nn char second extra header string
...
nn word 0
Then follows the message - length xxxxxxxx

Colin (The original author)

wilsoncpw avatar Jan 07 '19 18:01 wilsoncpw

And I am amazed to get such a succinct answer from you, Colin!  

I maintain www.codenewsfast.com  on a very back burner, volunteer
  basis.  I want to replace the part of my process that downloads
  over NNTP with your XanaNews code for downloading.  Everything
  within CodeNewsFast has been written in Delphi.  And I know
  XanaNews is more reliable than the code I have been trying to
  maintain all these years.  This way, I should be able to download
  to the XanaNews DAT files and process the articles from there,
  reliably and at my convenience, into the Firebird SQL database
  that holds the articles in the format I need for the public. 

So I think this is for a good cause. 

Thank you so much for the info & have a great year. 

Ann ( Lynnworth of HREF Tools Corp. )




On 07/01/2019 18:46, Colin Wilson
  wrote:


  
  Wow - I'm amazed that people are still actively looking at
    XanaNews!
  Here's the format of messages.dat...
  The format is:
    'X-Msg:' 6 char header
    xxxxxxxx 8 char hex string containg message length
    nn word length of first extra header
    char (nn) nn char first extra header string
    nn word length of second extra header string
    char (nn) nn char second extra header string
    ...
    nn word 0
    Then follows the message - length xxxxxxxx
  Colin (The original author)
  —
    You are receiving this because you authored the thread.
    Reply to this email directly, view it on GitHub, or mute the thread.
  {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/graemeg/xananews","title":"graemeg/xananews","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/graemeg/xananews"}},"updates":{"snippets":[{"icon":"PERSON","message":"@wilsoncpw in #12: Wow - I'm amazed that people are still actively looking at XanaNews!\r\n\r\nHere's the format of messages.dat...\r\n\r\nThe format is:                                                       \r\n   'X-Msg:'    6 char header                                          \r\n   xxxxxxxx   8 char hex string containg message length              \r\n   nn             word length of first extra header                      \r\n   char (nn)  nn char first extra header string                      \r\n   nn             word length of second extra header string              \r\n   char (nn)  nn char second extra header string                     \r\n   ...                                                                \r\n   nn          word 0                                                 \r\n                Then follows the message - length xxxxxxxx             \r\n\r\nColin (The original author)"}],"action":{"name":"View Issue","url":"https://github.com/graemeg/xananews/issues/12#issuecomment-452039114"}}}
  [

{ "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/graemeg/xananews/issues/12#issuecomment-452039114", "url": "https://github.com/graemeg/xananews/issues/12#issuecomment-452039114", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

amlynnworth avatar Jan 08 '19 01:01 amlynnworth

Thanks Colin for your reply. Yes, XanaNews is still the best NNTP news client around! I even use it under FreeBSD and Linux via WINE (the Windows API Emulator).

@amlynnworth : I believe the code you are looking for is in the unitNNTPServices.pas unit.

graemeg avatar Jan 08 '19 09:01 graemeg

Hello again. Would anyone care to explain the other .dat file structure? What separates articles? Is there a fast way to index in and know which bytes to read for a particular message?

We have been studying unitNNTPServices.pas in the last week but have not fully figured this out yet.

amlynnworth avatar Dec 15 '19 21:12 amlynnworth

Okay, tab separates fields within an article line and CRLF separates the article-basic-fact lines.

This is the remaining puzzle. I will ask about one example which is easy to see, and tiny:

embarcadero.public.announce\articles.dat

It has only 3 articles from year 2009.

My question is about the trailing integer fields. I see the #Lines, and then 3 more integers. If someone could explain what those are, that would be great.

The content of that articles.dat file follows, with full respect to John Kaster.

5	ANN: Scheduled quick maintenance	John Kaster <>	Fri, 29 May 2009 22:22:29 GMT	<[email protected]>		261	10	33554496	0
8	ANN: System Alert: Server maintenance	John Kaster <>	Fri, 12 Jun 2009 22:37:24 GMT	<[email protected]>		248	10	33554496	604
9	ANN: Electrical power testing on Saturday, June 27, 2009	John Kaster <>	Sat, 27 Jun 2009 00:40:53 GMT	<[email protected]>		367	14	33554496	1195

In article #5, line count is 261. What do the integers at the end of that line, i.e. 10, 33554496, 0, refer to?

Many thanks.

Ann

amlynnworth avatar Dec 17 '19 15:12 amlynnworth