web3.storage
web3.storage copied to clipboard
File load & pagination issues for user with many files
User observations
- Users with lots of uploads in web3.storage experience really slow loading times.
- Users with > 1000 uploads are not currently able to see them all in the web interface. It stops at 100 pages of 10 files (loads 1000 first ones only).
Acceptance criteria
- As a user I’m able to navigate through all my uploads
- Loading uploads should be reasonably quick
Notes
There are 2 parts to this:
- Have proper paginations to the page ( not loading 1000 items and then paginate in memory)
- Look into query optimisation
- This was first raised and looked into for ticket 1147
https://images.zenhubusercontent.com/378893240/704a835d-8695-4f22-853d-85ad199a1058/screen_recording_2022_03_21_at_10_38_42_720p.mp4
Done some initial investigation on this, there's a few ways to approach this task and am keen to get some feedback before tackling it!
Look like right now 1000 results are requested by the frontend, Pagination, searching and sorting are completed on the Frontend.
If we want to keep all "sort by" and "search" functionality, we would have to replicate this in the API.
Pagination
- To show the numerical "pages" of the results, we may have to change the API response to also deliver some metadata on total uploads for the frontend to calculate the amount of pages. This change could be avoided if we were to simply show "next" and "previous" instead of numerical total pages.
- I'd also like to add an
offset
param to the endpoint that works with the existingsize
param to chunk results by "page".
Sorting
- Looks like the API already supports sorting by Date and Name, meaning we'd also have to add sorting by size to this.
- We could use the existing
before
andafter
parameters to paginate through the API. I'm not a huge fan of this approach since it means we cannot sort by size or name since we'd be chunking the uploads by date. Additionally, since date/time stamps are not unique, an edge case could occur in which two or more uploads with the same upload time stamp (an unlikely, event, unless a user is uploading files in parallel). Paginating by datetime with duplicated timestamps could result in missing uploads listed by the API.
Searching
- I find searching a little odd, we currently allow searching by CID, but would a user ever search for results using a partial CID?
- Instead we could add a
CID
parameter to theGET user/uploads
endpoint to only return a single response, or add a new endpoint of/user/uploads/<CID>
to fetch a single result.
I could fix this all in a full fat solution in order keep parity with existing functionality, or I could implement an MVP hotfix for pagination using mainly existing API features but resulting in losing the sorting for file size and name.
Would love to hear your thoughts on this @dchoi27, @alanshaw & @orvn!
Look like right now 1000 results are requested by the frontend, Pagination, searching and sorting are completed on the Frontend.
Thanks for this @joshJarr! Yes, I agree, this is far from ideal and was really just implemented as a stopgap. It definitely needs to be replaced because:
- Response will be slow for large queries
- Once the uploads list is > 1000 objects, the search won't even address the entire collection
Looks like the API already supports sorting by Date and Name, meaning we'd also have to add sorting by size to this.
Since there's a split table design (not yet in production), where we show files that have been uploaded via the Pinning Services API separately, I think these features (sort
, size
, offset
, etc.) should be applied to the /pins
endpoint as well?
We could use the existing before and after parameters to paginate through the API. I'm not a huge fan of this approach since it means we cannot sort by size or name since we'd be chunking the uploads by date.
Agree as well, +1 to using offset
and size
in place of before
and after
dates.
I find searching a little odd, we currently allow searching by CID, but would a user ever search for results using a partial CID?
Most of the time, if my file is named correctly/semantically, I'll search by file name. But I'll also search by a fragment of a CID (to reliably find a specific file in a really large list).
Thanks @orvn! I've got a WIP PR for these changes: https://github.com/web3-storage/web3.storage/pull/1408
Since there's a split table design (not yet in production), where we show files that have been uploaded via the Pinning Services API separately, I think these features (sort, size, offset, etc.) should be applied to the /pins endpoint as well?
That's a great shout. I think we can perhaps make a new ticket for applying this logic to the /pins
endpoint. I'd love to see the new split design to see how best to do this
Agree as well, +1 to using offset and size in place of before and after dates.
Cool, I've implemented this in my WIP PR, currently weighing up using an offset
vs page
, as I think the latter reduces the effort of calculating page & page size on the frontend.
Most of the time, if my file is named correctly/semantically, I'll search by file name. But I'll also search by a fragment of a CID (to reliably find a specific file in a really large list).
Cool - this is great to know. I may move adding a search endpoint/filtering by query to a new ticket. This could mean temporarily removing this functionality unless we block this ticket from being released until the search is implemented.
I also may move sorting by size to a new ticket. Currently the file size is not stored on the upload entity, so we may need to add this column via migration as it is currently denormalised on request, so I cannot sort the uploads by this efficiently. Keen to know the importance of sort by size and if this too can temporarily be removed to keep this fix small.
hey @joshJarr
Thanks for going through these issues. The historical reason for no pages appearing is that this was not possible with FaunaDB (DB we were using before). Now we can add that if it is important.
The offset here will map to use https://supabase.com/docs/reference/dart/range#examples with the page, is it correct? I would call it size and page, given we already name like this in other APIs and prefer to keep things consistent.
Hey @vasco-santos, thanks for the context. Yep have it currently mapping to a .range
in my PR.
Yep that's a great point around offset vs page, I'll drop offset
in favour for page
and use page numbers instead of calculating the offset in the client. This approach keeps it more consistent and easier to integrate with.
Thanks!
@joshJarr @mbommerez Just let me or @drewdelano know once the pagination updates to the api lands so we can update the frontend to use it.
@JeffLowe Thanks, we are reviewing this internally and get a review from PL next and then get this ready for the next API release.
To clarify, this ticket's scope should be on pagination for the upload api. The utilization of the api by the frontend is ticketed in #1501
Thanks, will do @JeffLowe.
@vasco-santos I've requested your review on the PR, would be great to get your input and ensure it's consistent with existing functionality before merging. Many thanks!
Hey @JeffLowe @drewdelano Just an FYI, the API pagination has been merged and deployed! There's a minor chore to update the HTTP documentation with the new params, that's awaiting a website deploy.
Pagination
For now, if you add the page=1
param to the GET /user/uploads
you should see a paginated response. You can still set page size with the size
param.
I've added pagination metadata to the API response headers that you should be able to use to calculate page numbers and such. These are:
-
Count
: Number of total uploads. -
Size
: The requested size. -
Page
: The requested page.
Sorting
Also sorting can be achieved by adding sortBy
param, this supports 'Date' | 'Name'
for now.
sortOrder
can be used to order the sort, accepting 'Asc' | 'Desc'
.
When running the Web documentation locally, there should be more info there.
Let me know if you have any questions or feedback!
Thank you so much for this into @joshJarr !!
Reassigning to @joshJarr. PR #1699 needs completing and review.