web3.storage icon indicating copy to clipboard operation
web3.storage copied to clipboard

File load & pagination issues for user with many files

Open mbommerez opened this issue 2 years ago • 12 comments

User observations

  • Users with lots of uploads in web3.storage experience really slow loading times.
  • Users with > 1000 uploads are not currently able to see them all in the web interface. It stops at 100 pages of 10 files (loads 1000 first ones only).

Acceptance criteria

  • As a user I’m able to navigate through all my uploads
  • Loading uploads should be reasonably quick

Notes

There are 2 parts to this:

  • Have proper paginations to the page ( not loading 1000 items and then paginate in memory)
  • Look into query optimisation
  • This was first raised and looked into for ticket 1147

https://images.zenhubusercontent.com/378893240/704a835d-8695-4f22-853d-85ad199a1058/screen_recording_2022_03_21_at_10_38_42_720p.mp4

mbommerez avatar May 26 '22 13:05 mbommerez

Done some initial investigation on this, there's a few ways to approach this task and am keen to get some feedback before tackling it!

Look like right now 1000 results are requested by the frontend, Pagination, searching and sorting are completed on the Frontend.

If we want to keep all "sort by" and "search" functionality, we would have to replicate this in the API.

Pagination

  • To show the numerical "pages" of the results, we may have to change the API response to also deliver some metadata on total uploads for the frontend to calculate the amount of pages. This change could be avoided if we were to simply show "next" and "previous" instead of numerical total pages.
  • I'd also like to add an offset param to the endpoint that works with the existing size param to chunk results by "page".

Sorting

  • Looks like the API already supports sorting by Date and Name, meaning we'd also have to add sorting by size to this.
  • We could use the existing before and after parameters to paginate through the API. I'm not a huge fan of this approach since it means we cannot sort by size or name since we'd be chunking the uploads by date. Additionally, since date/time stamps are not unique, an edge case could occur in which two or more uploads with the same upload time stamp (an unlikely, event, unless a user is uploading files in parallel). Paginating by datetime with duplicated timestamps could result in missing uploads listed by the API.

Searching

  • I find searching a little odd, we currently allow searching by CID, but would a user ever search for results using a partial CID?
  • Instead we could add a CID parameter to the GET user/uploads endpoint to only return a single response, or add a new endpoint of /user/uploads/<CID> to fetch a single result.

I could fix this all in a full fat solution in order keep parity with existing functionality, or I could implement an MVP hotfix for pagination using mainly existing API features but resulting in losing the sorting for file size and name.

Would love to hear your thoughts on this @dchoi27, @alanshaw & @orvn!

joshJarr avatar May 27 '22 11:05 joshJarr

Look like right now 1000 results are requested by the frontend, Pagination, searching and sorting are completed on the Frontend.

Thanks for this @joshJarr! Yes, I agree, this is far from ideal and was really just implemented as a stopgap. It definitely needs to be replaced because:

  1. Response will be slow for large queries
  2. Once the uploads list is > 1000 objects, the search won't even address the entire collection

Looks like the API already supports sorting by Date and Name, meaning we'd also have to add sorting by size to this.

Since there's a split table design (not yet in production), where we show files that have been uploaded via the Pinning Services API separately, I think these features (sort, size, offset, etc.) should be applied to the /pins endpoint as well?

We could use the existing before and after parameters to paginate through the API. I'm not a huge fan of this approach since it means we cannot sort by size or name since we'd be chunking the uploads by date.

Agree as well, +1 to using offset and size in place of before and after dates.

I find searching a little odd, we currently allow searching by CID, but would a user ever search for results using a partial CID?

Most of the time, if my file is named correctly/semantically, I'll search by file name. But I'll also search by a fragment of a CID (to reliably find a specific file in a really large list).

orvn avatar May 27 '22 23:05 orvn

Thanks @orvn! I've got a WIP PR for these changes: https://github.com/web3-storage/web3.storage/pull/1408

Since there's a split table design (not yet in production), where we show files that have been uploaded via the Pinning Services API separately, I think these features (sort, size, offset, etc.) should be applied to the /pins endpoint as well?

That's a great shout. I think we can perhaps make a new ticket for applying this logic to the /pins endpoint. I'd love to see the new split design to see how best to do this

Agree as well, +1 to using offset and size in place of before and after dates.

Cool, I've implemented this in my WIP PR, currently weighing up using an offset vs page, as I think the latter reduces the effort of calculating page & page size on the frontend.

Most of the time, if my file is named correctly/semantically, I'll search by file name. But I'll also search by a fragment of a CID (to reliably find a specific file in a really large list).

Cool - this is great to know. I may move adding a search endpoint/filtering by query to a new ticket. This could mean temporarily removing this functionality unless we block this ticket from being released until the search is implemented.

I also may move sorting by size to a new ticket. Currently the file size is not stored on the upload entity, so we may need to add this column via migration as it is currently denormalised on request, so I cannot sort the uploads by this efficiently. Keen to know the importance of sort by size and if this too can temporarily be removed to keep this fix small.

joshJarr avatar Jun 07 '22 10:06 joshJarr

hey @joshJarr

Thanks for going through these issues. The historical reason for no pages appearing is that this was not possible with FaunaDB (DB we were using before). Now we can add that if it is important.

The offset here will map to use https://supabase.com/docs/reference/dart/range#examples with the page, is it correct? I would call it size and page, given we already name like this in other APIs and prefer to keep things consistent.

vasco-santos avatar Jun 14 '22 13:06 vasco-santos

Hey @vasco-santos, thanks for the context. Yep have it currently mapping to a .range in my PR. Yep that's a great point around offset vs page, I'll drop offset in favour for page and use page numbers instead of calculating the offset in the client. This approach keeps it more consistent and easier to integrate with. Thanks!

joshJarr avatar Jun 17 '22 13:06 joshJarr

@joshJarr @mbommerez Just let me or @drewdelano know once the pagination updates to the api lands so we can update the frontend to use it.

JeffLowe avatar Jun 20 '22 21:06 JeffLowe

@JeffLowe Thanks, we are reviewing this internally and get a review from PL next and then get this ready for the next API release.

mbommerez avatar Jun 22 '22 14:06 mbommerez

To clarify, this ticket's scope should be on pagination for the upload api. The utilization of the api by the frontend is ticketed in #1501

JeffLowe avatar Jun 24 '22 15:06 JeffLowe

Thanks, will do @JeffLowe.

@vasco-santos I've requested your review on the PR, would be great to get your input and ensure it's consistent with existing functionality before merging. Many thanks!

joshJarr avatar Jun 28 '22 14:06 joshJarr

Hey @JeffLowe @drewdelano Just an FYI, the API pagination has been merged and deployed! There's a minor chore to update the HTTP documentation with the new params, that's awaiting a website deploy.

Pagination For now, if you add the page=1 param to the GET /user/uploads you should see a paginated response. You can still set page size with the size param.

I've added pagination metadata to the API response headers that you should be able to use to calculate page numbers and such. These are:

  • Count: Number of total uploads.
  • Size: The requested size.
  • Page: The requested page.

Sorting Also sorting can be achieved by adding sortBy param, this supports 'Date' | 'Name' for now. sortOrder can be used to order the sort, accepting 'Asc' | 'Desc'.

When running the Web documentation locally, there should be more info there.

Let me know if you have any questions or feedback!

joshJarr avatar Jun 30 '22 13:06 joshJarr

Thank you so much for this into @joshJarr !!

JeffLowe avatar Jun 30 '22 13:06 JeffLowe

Reassigning to @joshJarr. PR #1699 needs completing and review.

mbommerez avatar Aug 09 '22 10:08 mbommerez