dataverse
dataverse copied to clipboard
Allow installations to determine order of files on Dataset page
Some installations will find it useful to define the order in which files in a dataset are presented to users. An example of this is QDR, where "0_" is prepended to the names of documentation files as a not-so-great way to get those documentation files to the top, since the ordering is alphabetical:
https://data.qdr.syr.edu/dataset.xhtml?persistentId=doi:10.5064/F6BUAX58
Consider an installation setting of ordering by a specific tag, date, alphanumeric, etc. or some other solution.
- Related: #5280
Base on earlier IQSS/QDR discussions, we've just implemented a way to 'statically' order files by tag, using a new setting to allow specification of the tag sort order. (Files are sorted by tag, and alphabetically for those with the same tag). The order can include the standard Documentation, Data, and Code tags as well as custom ones created in an instance. The files are sorted after retrieval, relying on the fact that DV retrieves the whole file list (versus one page's worth). On a dataset with 2K files, having a few near the end with tags that have to get sorted to the top doesn't seem to affect display speed much (~6 seconds with or without sorting in my test).
I'll go ahead and create a PR to get the code out, but it may not make sense to merge if it won't work with the overall page redesign (it might if the underlying code to handle the list of files doesn't change), or if there's progress on #5280, etc.
One note w.r.t. the code: I implemented sorting by replacing the comparator used at https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java#L205. However, in doing that, I discovered, at QDR, that this sort did not work when the List<FileMetadata> was an org.eclipse.persistence.indirection.IndirectList. I suspect that https://bugs.eclipse.org/bugs/show_bug.cgi?id=446236 is involved. As far as I can tell, the FileMetadatas are generally alphabetical to start with, so I'm not sure that the failure to sort by label (file name) would have been noticed. To sort by category, I make sure that the List is an ArrayList which gets sorted properly. This step could go away once the org.eclipse.persistence.indirection.IndirectList class is updated (I'm guessing it's in glassfish somewhere?).
Thanks @qqmyers. As you mention, I don't anticipate we'll merge this due to dataset and file redesign efforts in flight, but others in the community may be interested in the code.
I think the biggest impact the redesign would have on this issue is the inclusion of a "Sort" btn above the file table (see attached), that allows the user to toggle between multiple sort options.

It appears this pull request does not include any UI impact as of yet. My feedback would be to include the sort functionality in the UI. That feature was originally request in issue #2506, which was closed and consolidated into all of the file ui improvements in issue #3404, but that doesn't mean it needs to wait for all of the other redesign efforts.
If we're not waiting for all the other redesign efforts we might want @tjanek to take a look at pull request #5485 since he pushed some related code to a branch (unmerged) for #5280.
FWIW - I think this PR is consistent with a sort button: it could just be one more option along with by date, alphabetical, and by category. In that sense it could separated from the GUI update if desired.
Hi @qqmyers and all -- since we have some user testing efforts related to sort in the next few weeks, we’ll use that opportunity to learn more about what other scenarios are in play for default sort options. I’m going to close out this PR for now as I don’t want to take action until we have some more information about what’s best for as many default sort scenarios as possible. As you and others have mentioned, the user-driven option can be implemented independently of this, and we’ll plan to do that (cc: @TaniaSchlatter).
@qqmyers - we released the sort options in 4.15 as part of #5584. Should we revisit this in the post-4.15 world?
cc: @TaniaSchlatter
We would like to vote to have this revisited, as we have several large datasets that would greatly benefit from having a way to sort the index or documentation (readme) to the top of the pages. The idea of having a config setting to enable/disable the feature of sorting the "Documentation" or other such tag to the top of the list would be sufficient. For now we are also just naming those files with "0_" or other means.