arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-41673: [Format][Docs] Add arrow format introductory page

Open AlenkaF opened this issue 1 year ago • 12 comments

Rationale for this change

The documentation for Arrow Format could be improved:

  • all types are not listed
  • all layouts are not explained

What changes are included in this PR?

This PR includes:

  • motivation behind the columnar format
  • different physical layouts explained together with diagrams of example type in comparison to the physical layout
  • Arrow terminology
  • Arrow C Data interface

in a separate "introduction" page with no technical details. Specifications index page is also restructured to include captions and make the left sidebar menu better organised.

Note: a table with all types listed together with their physical layout will be added in a separate PR to existing Columnar.rst page: https://github.com/apache/arrow/issues/14752

Are these changes tested?

No, this is a docs change.

Are there any user-facing changes?

No.

  • GitHub Issue: #41673

AlenkaF avatar May 08 '24 15:05 AlenkaF

cc @amoeba this could use a look already. I think all I wanted to add is here. Will need to do a general look through one more time before marking it ready for review though.

AlenkaF avatar May 09 '24 13:05 AlenkaF

@github-actions crossbow submit preview-docs

AlenkaF avatar May 09 '24 13:05 AlenkaF

Revision: 3cdd97a0af2bea1914b7e9b0e7a04e47e04e0c41

Submitted crossbow builds: ursacomputing/crossbow @ actions-4a1cc2326d

Task Status
preview-docs GitHub Actions

github-actions[bot] avatar May 09 '24 13:05 github-actions[bot]

@github-actions crossbow submit preview-docs

AlenkaF avatar May 13 '24 16:05 AlenkaF

Revision: 4d2bf8ad6103d094ebdfc8ac3546a2692c08ff23

Submitted crossbow builds: ursacomputing/crossbow @ actions-cc7da250f4

Task Status
preview-docs GitHub Actions

github-actions[bot] avatar May 13 '24 16:05 github-actions[bot]

Not sure why the captions in the left sidebar menu are not visible in the crossbow preview build:

Screenshot 2024-05-13 at 19 47 01

but are visible for me locally:

Screenshot 2024-05-13 at 19 46 44

AlenkaF avatar May 13 '24 17:05 AlenkaF

Update: I have removed the change in docs/source/format/index.rst (captions for the Specifications section) and will move it to a separate PR, see https://github.com/apache/arrow/pull/41593/commits/97e4217ab68040167d31516a752fba6acd226177.

AlenkaF avatar May 15 '24 12:05 AlenkaF

Hey @AlenkaF, this is so great to see. I think the text and diagrams will be useful and the pairing looks useful. I left some suggestions for style and:

  • Did an editing pass over the text. Feel free to ignore any you don't like.
  • I'm a bit late in the process here but I noticed in the diagrams that we use - for null. It kinda looks like a minus symbol sometimes instead of indicating a missing element. I wonder if a _ or ? might be more clear? I realize part of the issue here is due to limitations of Excalidraw.

Edit: There was some phrasing that I think could still be tweaked that I didn't add as suggestions so if you'd be okay with that I could do another pass over the text. Lemme know.

amoeba avatar May 15 '24 23:05 amoeba

  • Did an editing pass over the text. Feel free to ignore any you don't like.

Thank you a bunch, this is very very helpful!

  • I'm a bit late in the process here but I noticed in the diagrams that we use - for null. It kinda looks like a minus symbol sometimes instead of indicating a missing element. I wonder if a _ or ? might be more clear? I realize part of the issue here is due to limitations of Excalidraw.

Not late at all, I can still make changes and I really wish for the diagrams to be as clear as possible. And you are correct, for example in the fixed size list this issue gets very visible when -7 is used. I am not sure about ? though, _ feels a bit better. The specifications use the term unspecified. I am not sure what Matt uses in his book as I do not have it with me at the moment, but I think it is some kind of an abbreviation (U or UN maybe?)

Edit: There was some phrasing that I think could still be tweaked that I didn't add as suggestions so if you'd be okay with that I could do another pass over the text. Lemme know.

That would be super great, if you have time, thank you!

AlenkaF avatar May 16 '24 09:05 AlenkaF

@github-actions crossbow submit preview-docs

raulcd avatar May 16 '24 23:05 raulcd

Revision: 0af1708219682c9b32f04ac3904a11e62affaeed

Submitted crossbow builds: ursacomputing/crossbow @ actions-dde0f75093

Task Status
preview-docs GitHub Actions

github-actions[bot] avatar May 16 '24 23:05 github-actions[bot]

Not sure what is it but there seems to be something going on with the left hand side panel. If I go to the columnar format page the tutorial link does not appear: image If I manually go to the Intro page it appears but there is a wrong increased level after Arrow Columnar Format, see the image below: image

raulcd avatar May 20 '24 12:05 raulcd

@raulcd thanks for checking the sidebar! It should be corrected with https://github.com/apache/arrow/pull/41593/commits/2a990b42f85df862af6d6246e18cfc03ea3f4cbb

AlenkaF avatar May 21 '24 07:05 AlenkaF

@github-actions crossbow submit preview-docs

AlenkaF avatar May 21 '24 07:05 AlenkaF

Revision: 830ac9ae527d1d599efd49bd66b2c5ca044ea414

Submitted crossbow builds: ursacomputing/crossbow @ actions-ed83863144

Task Status
preview-docs GitHub Actions

github-actions[bot] avatar May 21 '24 07:05 github-actions[bot]

@jorisvandenbossche I have addressed all of your comments.

AlenkaF avatar May 27 '24 09:05 AlenkaF

@github-actions crossbow submit preview-docs

AlenkaF avatar May 27 '24 09:05 AlenkaF

Revision: 7312c2c708e6ac7a53bb6008768952ce01bf0d35

Submitted crossbow builds: ursacomputing/crossbow @ actions-042f60b807

Task Status
preview-docs GitHub Actions

github-actions[bot] avatar May 27 '24 09:05 github-actions[bot]

This is a bigger PR but only documentation and would need some 👀 in case anybody has time: http://crossbow.voltrondata.com/pr_docs/41593/format/Intro.html @felipecrv @paleolimbot @danepitkin

AlenkaF avatar Jun 04 '24 13:06 AlenkaF

@github-actions crossbow submit preview-docs

AlenkaF avatar Jun 11 '24 04:06 AlenkaF

Revision: 9f9bbff0c00026fe8544292bc28e358b7c2ffa47

Submitted crossbow builds: ursacomputing/crossbow @ actions-cee8fb4563

Task Status
preview-docs GitHub Actions

github-actions[bot] avatar Jun 11 '24 04:06 github-actions[bot]

Fresh link to the html version: http://crossbow.voltrondata.com/pr_docs/41593/format/Intro.html

AlenkaF avatar Jun 11 '24 07:06 AlenkaF

@github-actions crossbow submit preview-docs

AlenkaF avatar Sep 18 '24 11:09 AlenkaF

Sorry for taking a bit of time to get back to this PR. It is in a good shape now and would try to get it into the next, 18.0.0, release. Pinging all for last round of review ;)

AlenkaF avatar Sep 18 '24 11:09 AlenkaF

Revision: 581daf346e80dceffc5291f78cb5149e6b4d3c4e

Submitted crossbow builds: ursacomputing/crossbow @ actions-ace5dab0c4

Task Status
preview-docs GitHub Actions

github-actions[bot] avatar Sep 18 '24 11:09 github-actions[bot]

Thank you all for reviewing this PR, not a small chunk of content! Will keep it open for comments till Monday and then merge if there is nothing new.

AlenkaF avatar Sep 26 '24 17:09 AlenkaF

@github-actions crossbow submit preview-docs

AlenkaF avatar Sep 30 '24 17:09 AlenkaF

Revision: 158ee3275883aab30f781e2c1ff7322243d1c21c

Submitted crossbow builds: ursacomputing/crossbow @ actions-a04a6c8710

Task Status
preview-docs GitHub Actions

github-actions[bot] avatar Sep 30 '24 17:09 github-actions[bot]

I am not sure why the link to the preview is not loading http://crossbow.voltrondata.com/pr_docs/41593? @assignUser is it just me?

I build the docs locally to check the html version before I merge.

AlenkaF avatar Oct 01 '24 04:10 AlenkaF

I am not sure why the link to the preview is not loading http://crossbow.voltrondata.com/pr_docs/41593?

It seems to be loading for me now

raulcd avatar Oct 01 '24 08:10 raulcd