mathesar icon indicating copy to clipboard operation
mathesar copied to clipboard

make row order deterministic in more cases

Open mathemancer opened this issue 2 years ago • 6 comments

Fixes #1786

Now, we automatically append a last-step ordering by the primary key field(s) of a table when getting its records.

Technical details

This does not handle the case where both are true:

  • The user does not specify a fully-determined order, and
  • there is no primary key.

It would be possible to eke out a bit more ordering, since we could append all non-included columns to the requested order_by clause to make it more deterministic, but I chose not to, since this would have a performance impact, and any table created through Mathesar will have a primary key anyway. The performance impact would be due to sorting by non-indexed columns.

Checklist

  • [X] My pull request has a descriptive title (not a vague title like Update index.md).
  • [X] My pull request targets the master branch of the repository
  • [X] My commit messages follow best practices.
  • [X] My code follows the established code style of the repository.
  • [ ] I added tests for the changes I made (if applicable).
  • [X] I added or updated documentation (if applicable).
  • [X] I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

mathemancer avatar Oct 12 '22 09:10 mathemancer

@seancolsen The only review I'm requesting from you is that this solves the problem you described from the user perspective.

mathemancer avatar Oct 12 '22 09:10 mathemancer

Codecov Report

Base: 92.44% // Head: 92.48% // Increases project coverage by +0.03% :tada:

Coverage data is based on head (8148e52) compared to base (72cfaff). Patch coverage: 94.64% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1810      +/-   ##
==========================================
+ Coverage   92.44%   92.48%   +0.03%     
==========================================
  Files         146      147       +1     
  Lines        7122     7155      +33     
==========================================
+ Hits         6584     6617      +33     
  Misses        538      538              
Flag Coverage Δ
pytest-backend 92.48% <94.64%> (+0.03%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mathesar/api/db/viewsets/records.py 97.05% <ø> (-0.03%) :arrow_down:
mathesar/models/base.py 92.87% <ø> (ø)
db/transforms/base.py 93.70% <66.66%> (-0.84%) :arrow_down:
db/records/operations/sort.py 95.65% <95.65%> (ø)
db/queries/base.py 98.71% <100.00%> (ø)
db/records/exceptions.py 100.00% <100.00%> (ø)
db/records/operations/select.py 97.56% <100.00%> (+4.70%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Oct 12 '22 09:10 codecov-commenter

Fascinating. I clicked on a couple tables to make sure the order looked good, but didn't notice anything. Now, with more info, I've found that I clicked the wrong tables to check. I've determined that:

  • Authors, Publications, and Publishers are all consistently out of order on my machine.
  • Checkouts, Items, and Patrons are all consistently properly ordered on my machine.

@seancolsen Is this the case on your machine as well?

mathemancer avatar Oct 13 '22 02:10 mathemancer

Moving to draft status while I figure out why the fix isn't working properly.

mathemancer avatar Oct 13 '22 03:10 mathemancer

@mathemancer

  • Authors, Publications, and Publishers are all consistently out of order on my machine.
  • Checkouts, Items, and Patrons are all consistently properly ordered on my machine.

Is this the case on your machine as well?

No. Currently all of that is true for me except that Items is out of order. I have wiped out my .volumes and rebuilt Mathesar a number of times since we began using this Library Management schema and I'm about 80% confident that I've observed different sorting behavior after rebuilding. Although I only have a rudimentary understanding of the inner-working of Postgres, I would expect to observe these ordering inconsistencies, given this excerpt from the Postgres docs (emphasis added):

The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on

My hunch is that when we load the library data, it gets placed on-disk with an ordering subject to the fragmentation of other data on-disk at that point, though also I understand even less about how SSDs work nowadays. Just a hunch. I'm about 60% confident that ordering is consistent after I re-build, but re-building seems to shuffle the ordering somewhat.

seancolsen avatar Oct 13 '22 12:10 seancolsen

. I'm about 60% confident that ordering is consistent after I re-build, but re-building seems to shuffle the ordering somewhat.

This is my experience as well. I finally figured out the problem. Unbeknownst to me, we'd added another way to get records from a table so we can join previews in. I fixed the ordering (and the problem you'd noticed was already fixed) on a method for getting table records that's not used in most cases any more. The downside is, I'm now trying to solve a much more complicated problem of providing a default ordering on query results (where a primary key is often not present). The plus side is, when I succeed, query results will also be consistently ordered in the data explorer.

mathemancer avatar Oct 14 '22 03:10 mathemancer

@seancolsen @dmos62 This needs another review, since I had to make some major changes.

As noted above, we're using the querying infrastructure for many (but not all) table row requests now, and I'd neglected that part of things. Unfortunately, adding default sorting to general queries makes this PR quite a bit more complicated.

mathemancer avatar Oct 14 '22 08:10 mathemancer

This time, I checked all tables to make sure they were properly ordered.

mathemancer avatar Oct 14 '22 08:10 mathemancer