tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Document index file formats - Issue/981

Open fmassot opened this issue 4 years ago • 4 comments

related to #981

This is a work in progress and is there to have early feedbacks on the documentation structure.

fmassot avatar Apr 19 '21 15:04 fmassot

I like the lucene way of describing the file formats (it is not proper to lucene actually I have seen it elsewhere). e.g.

https://lucene.apache.org/core/3_0_3/fileformats.html#Segments%20File

label ---> <B>^N means B repeated N times.

fulmicoton avatar Apr 20 '21 00:04 fulmicoton

I like the lucene way of describing the file formats (it is not proper to lucene actually I have seen it elsewhere). e.g.

https://lucene.apache.org/core/3_0_3/fileformats.html#Segments%20File

label ---> ^N means B repeated N times.

yes, I finally understand it and will use it, I really wanted to not keep the current format!

fmassot avatar Apr 20 '21 07:04 fmassot

Hi, thanks for your effort! Would it be possible to strictly separate "data structure (data type)" and its "description"? I mean, relatively recent Lucene format documentation is written as this. https://lucene.apache.org/core/8_8_1/core/org/apache/lucene/codecs/lucene84/Lucene84PostingsFormat.html#Termdictionary

Jfyi, I wanted to show one bad example... old Lucene file format documentation is mixed up with various information; unfortunately it has become really difficult to understand with its growth. https://lucene.apache.org/core/8_8_1/core/org/apache/lucene/codecs/lucene50/Lucene50TermVectorsFormat.html

mocobeta avatar May 07 '21 15:05 mocobeta

I think this is a great way of describing a format: https://github.com/mocobeta/lucene-postings-format, well done @mocobeta

PSeitz avatar May 21 '21 05:05 PSeitz