GH-41673: [Format][Docs] Add arrow format introductory page
Rationale for this change
The documentation for Arrow Format could be improved:
- all types are not listed
- all layouts are not explained
What changes are included in this PR?
This PR includes:
- motivation behind the columnar format
- different physical layouts explained together with diagrams of example type in comparison to the physical layout
- Arrow terminology
- Arrow C Data interface
in a separate "introduction" page with no technical details. Specifications index page is also restructured to include captions and make the left sidebar menu better organised.
Note: a table with all types listed together with their physical layout will be added in a separate PR to existing Columnar.rst page: https://github.com/apache/arrow/issues/14752
Are these changes tested?
No, this is a docs change.
Are there any user-facing changes?
No.
- GitHub Issue: #41673
cc @amoeba this could use a look already. I think all I wanted to add is here. Will need to do a general look through one more time before marking it ready for review though.
@github-actions crossbow submit preview-docs
Revision: 3cdd97a0af2bea1914b7e9b0e7a04e47e04e0c41
Submitted crossbow builds: ursacomputing/crossbow @ actions-4a1cc2326d
| Task | Status |
|---|---|
| preview-docs |
@github-actions crossbow submit preview-docs
Revision: 4d2bf8ad6103d094ebdfc8ac3546a2692c08ff23
Submitted crossbow builds: ursacomputing/crossbow @ actions-cc7da250f4
| Task | Status |
|---|---|
| preview-docs |
Not sure why the captions in the left sidebar menu are not visible in the crossbow preview build:
but are visible for me locally:
Update: I have removed the change in docs/source/format/index.rst (captions for the Specifications section) and will move it to a separate PR, see https://github.com/apache/arrow/pull/41593/commits/97e4217ab68040167d31516a752fba6acd226177.
Hey @AlenkaF, this is so great to see. I think the text and diagrams will be useful and the pairing looks useful. I left some suggestions for style and:
- Did an editing pass over the text. Feel free to ignore any you don't like.
- I'm a bit late in the process here but I noticed in the diagrams that we use
-for null. It kinda looks like a minus symbol sometimes instead of indicating a missing element. I wonder if a_or?might be more clear? I realize part of the issue here is due to limitations of Excalidraw.
Edit: There was some phrasing that I think could still be tweaked that I didn't add as suggestions so if you'd be okay with that I could do another pass over the text. Lemme know.
- Did an editing pass over the text. Feel free to ignore any you don't like.
Thank you a bunch, this is very very helpful!
- I'm a bit late in the process here but I noticed in the diagrams that we use
-for null. It kinda looks like a minus symbol sometimes instead of indicating a missing element. I wonder if a_or?might be more clear? I realize part of the issue here is due to limitations of Excalidraw.
Not late at all, I can still make changes and I really wish for the diagrams to be as clear as possible. And you are correct, for example in the fixed size list this issue gets very visible when -7 is used. I am not sure about ? though, _ feels a bit better. The specifications use the term unspecified. I am not sure what Matt uses in his book as I do not have it with me at the moment, but I think it is some kind of an abbreviation (U or UN maybe?)
Edit: There was some phrasing that I think could still be tweaked that I didn't add as suggestions so if you'd be okay with that I could do another pass over the text. Lemme know.
That would be super great, if you have time, thank you!
@github-actions crossbow submit preview-docs
Revision: 0af1708219682c9b32f04ac3904a11e62affaeed
Submitted crossbow builds: ursacomputing/crossbow @ actions-dde0f75093
| Task | Status |
|---|---|
| preview-docs |
Not sure what is it but there seems to be something going on with the left hand side panel. If I go to the columnar format page the tutorial link does not appear:
If I manually go to the Intro page it appears but there is a wrong increased level after
Arrow Columnar Format, see the image below:
@raulcd thanks for checking the sidebar! It should be corrected with https://github.com/apache/arrow/pull/41593/commits/2a990b42f85df862af6d6246e18cfc03ea3f4cbb
@github-actions crossbow submit preview-docs
Revision: 830ac9ae527d1d599efd49bd66b2c5ca044ea414
Submitted crossbow builds: ursacomputing/crossbow @ actions-ed83863144
| Task | Status |
|---|---|
| preview-docs |
@jorisvandenbossche I have addressed all of your comments.
@github-actions crossbow submit preview-docs
Revision: 7312c2c708e6ac7a53bb6008768952ce01bf0d35
Submitted crossbow builds: ursacomputing/crossbow @ actions-042f60b807
| Task | Status |
|---|---|
| preview-docs |
This is a bigger PR but only documentation and would need some 👀 in case anybody has time: http://crossbow.voltrondata.com/pr_docs/41593/format/Intro.html @felipecrv @paleolimbot @danepitkin
@github-actions crossbow submit preview-docs
Revision: 9f9bbff0c00026fe8544292bc28e358b7c2ffa47
Submitted crossbow builds: ursacomputing/crossbow @ actions-cee8fb4563
| Task | Status |
|---|---|
| preview-docs |
Fresh link to the html version: http://crossbow.voltrondata.com/pr_docs/41593/format/Intro.html
@github-actions crossbow submit preview-docs
Sorry for taking a bit of time to get back to this PR. It is in a good shape now and would try to get it into the next, 18.0.0, release. Pinging all for last round of review ;)
Revision: 581daf346e80dceffc5291f78cb5149e6b4d3c4e
Submitted crossbow builds: ursacomputing/crossbow @ actions-ace5dab0c4
| Task | Status |
|---|---|
| preview-docs |
Thank you all for reviewing this PR, not a small chunk of content! Will keep it open for comments till Monday and then merge if there is nothing new.
@github-actions crossbow submit preview-docs
Revision: 158ee3275883aab30f781e2c1ff7322243d1c21c
Submitted crossbow builds: ursacomputing/crossbow @ actions-a04a6c8710
| Task | Status |
|---|---|
| preview-docs |
I am not sure why the link to the preview is not loading http://crossbow.voltrondata.com/pr_docs/41593? @assignUser is it just me?
I build the docs locally to check the html version before I merge.
I am not sure why the link to the preview is not loading http://crossbow.voltrondata.com/pr_docs/41593?
It seems to be loading for me now