hydroshare icon indicating copy to clipboard operation
hydroshare copied to clipboard

My Resources Page: Performance Enhancements Needed

Open horsburgh opened this issue 8 years ago • 24 comments

Currently it takes several seconds for the My Resources page to load. This is tedious for frequent users of HydroShare who are switching between the My Resources page and the landing pages for individual resources. It's gotten to the point that it is affecting the usability of the system. I suggest that it's time to take a critical look at the My Resources page to see if we can speed up the initial page load. The following are potential options @Maurier and I have discussed, although not in depth:

  1. Add a paginator to the table of resources on the My Resources page
  2. Create a different or modify the existing API call that is returning information for this page. Right now it returns way more information than is really needed to populate the table on the page.
  3. Save the state of the filters in the browser's local storage to allow users to use their own defaults for the filter states.

Other strategies for speeding up this page may be needed. An additional performance consideration is the "Favoriting" functionality that is very slow if a large number of resources are selected for favoriting at the same time.

horsburgh avatar Oct 26 '17 04:10 horsburgh

@pkdash see https://github.com/hydroshare/hydroshare/blob/develop/hs_core/views/init.py#L1057 A good start would be to replace this method with a more specific one designed for the My Resources page. Ideally it would pass only the information needed. The one used right now passes way too much information to the view making it very slow.

Maurier avatar Nov 13 '17 20:11 Maurier

Here is the network analysis to load the page: image Notice how must of the time spent is on the time to first byte. This is an indication that processing the data before transfer is taking too long.

Maurier avatar Nov 13 '17 20:11 Maurier

I am blocked from profiling the /my-resources/ View-Template, due to Mezzanine-managed URLs for that endpoint. I will now go learn Mezzanine and its interactions with Django, so I can complete the Profiling. I am able to properly gather information about /collaborate/ and other Django native URLs in hydroshare/urls.py

This is the start of the trail I'm tracing: https://github.com/hydroshare/hydroshare/blob/76d3697828954ab4028dc6aaafd871282c115a4f/hydroshare/urls.py#L176

ghost avatar Feb 21 '19 00:02 ghost

@sblack-usu I git blamed you on this. Can you explain this comment? https://github.com/hydroshare/hydroshare/blob/76d3697828954ab4028dc6aaafd871282c115a4f/hydroshare/settings.py#L157

ghost avatar Feb 21 '19 00:02 ghost

Can Mezzanine be completely removed from HydroShare or is this just a small aspect?

ghost avatar Feb 21 '19 00:02 ghost

the TODO should've been removed. There was a period of time we were using two authentication backends (the mezzanine one that was removed and the caseinsensitive one). We're only using the case insensitive one now.

sblack-usu avatar Feb 21 '19 00:02 sblack-usu

Can Mezzanine be completely removed from HydroShare or is this just a small aspect?

It's just the authentication backend.

sblack-usu avatar Feb 21 '19 00:02 sblack-usu

With more common use-case listings of 40 resources, load time is closer to 6 seconds. Unpinning issue, but leaving priority at medium to cover edge-case users with hundreds of items.

ghost avatar Feb 25 '19 19:02 ghost

@Maurier do you want to remove yourself as assignee as this is likely a backend/fullstack issue?

ghost avatar Feb 25 '19 19:02 ghost

With recent enhancements, performance of the My Resources page is now workable for me.

However, it still takes over 10 seconds to load the page for me. I own 27 resources, but 241 are shared with me, so I guess I may have more resources in the system than many users would. However, even at the 6-second load time reported above, I think for frequent users of the system, this might be tedious. There may be larger fish to fry at the moment, and to get the load time down, a different approach may be needed for getting the information needed to render this page. I would be OK with demoting this high priority issue to lower priority (but still important).

horsburgh avatar Apr 23 '19 16:04 horsburgh

I just tested and found it still to be extremely slow (16 seconds timed), however I have something like 300 resources, 169 shared with me. The common user will not encounter this problem.

Lizabrazil avatar Apr 23 '19 17:04 Lizabrazil

Related or duplicated https://github.com/hydroshare/hydroshare/issues/2405

ghost avatar Apr 25 '19 18:04 ghost

I just tested this in my account and it takes 12.71 seconds to load "My Resources". Below are some details about the content that was loaded:

Owned resources: 151 Shared with me: 209 Added by me: 5 Favorites: 15

Castronova avatar Sep 10 '20 14:09 Castronova

Limitations here are shared by the Communities resources page, which are being fixed with ongoing Communities/Indexing work. At a future date, the option for My Resources to reuse patterns in Communities work will be a solid option.

ghost avatar Jan 25 '21 21:01 ghost

Options should be investigated for how to resolve this, per issue review meeting on 4/28.

eclark-cuahsi avatar Apr 29 '22 17:04 eclark-cuahsi

I think @pkdash has already addressed the low-hanging fruit for queries/back-end in his work on #4091. I'm not convinced that simple backend pagination will actually help significantly. The reason I'm skeptical: pagination on the backend before annotation and prefetching_related_objects would certainly help with query speed. But we have client-side filters on this page. So simple backend pagination would render the JS filtering inoperable. To make the filters function, the backend would have to be made "aware" of the client filters, thus complicating and slowing down the backend queries. Bit of a Catch22. So if we want pagination, I think we will indeed need to create an endpoint that is "aware" of the frontend filter state and returns only relevant info (the current view just returns everything).

devincowan avatar Jun 22 '22 20:06 devincowan

@horsburgh I've created a prototype with some potential improvements for testing. It includes all 3 of the ideas that you had at the onset of this issue:

  1. Add a paginator to the My Resources table
  2. Modifying the backend Django view that retrieves the resources, so that it only retrieves info selected in filters
  3. Save filter state

@pkdash also did work on this page's performance as part of https://github.com/hydroshare/hydroshare/issues/4091. He was able to cut the load time in half for many cases!

The further changes I've prototyped shave an additional 40-50% off load time depending on the use scenario. For example, a user who owns 400 resources gets average of 13.5 sec load time as of 1.56.1 8.27 sec load time with Pabitra's modifications 5.73 sec load time paginating with 40 resources/page and 1.98 sec load time using a different user with whom 400 resources are shared but only 12 are owned (the 400 shared are filtered out)

So hey, fast is good right? Yes, but my modifications come with a tradeoff: LIMITED SEARCH Pagination performance gain can only be realized if we are willing to compromise. Either:

  • Search would be limited to only the results on the given page (so for our example above, it would only query within the n=40 resources on the current page)
  • OR search would have to wait for the paginator to "reset" to a show all resources on a page before searching.

How do we feel about those tradeoffs? Another option (which is my personal favorite) is to scrap the pagination and keeping the improved filtering (from me) and queries (from Pabitra). This would still stand as a huge performance gain.

devincowan avatar Jun 30 '22 16:06 devincowan

@devincowan - I don't think making the search work only on the 40 resources on the current page makes sense. So, I wouldn't vote for that.

The Search box on the My resources page should search within the set of resources that appear on the My Resources page (if paginated, search should work within all of the pages). If there are filters set, then the search should work within the filtered resources. In other words, it would be great to speed up the functionality we have now.

Yes, I like having cake and eating it!

horsburgh avatar Jun 30 '22 20:06 horsburgh

@horsburgh I think that you're right: sacrificing search in order to improve performance isn't a good way to go. So I scrapped the pagination that I had implemented. Just this AM revisited this to implement some async filtering etc that should give users with large resource #s some significant improvement. Still need to test the performance--I've pinged @pkdash to take a look too.

devincowan avatar Jul 29 '22 16:07 devincowan

Discussed today on Dev call, some other options for My Resources page.

  1. Current draft https://github.com/hydroshare/hydroshare/pull/4713 uses DataTables.js existing My Resources implementation and improves performance by returning resources by filter.
  2. @martinseul suggested modifying My Resources to use same tooling as Discover. Right now, Discover is a Vue app that uses axios to get from /discoverapi. That django view uses haystack/Solr. Looking at the endpoint, it seems to have what we would need for the My Resources page, except:
  3. Martin also had the idea to continue to use DataTables.js and use its server-side processing API. This would require a whole new endpoint that can ingest the sent params and return as expected.

1 is almost ready to go.

2 I would imagine will further improve performance during searching/load. But would entail a full revamp of the My Resources front and back end. @Maurier might have an opinion about how difficult that would be. Might be as simple as porting most of Search.vue over and adding the above 3 items to /discoverapi or getting them from another endpoint. But if we want the frontend to be styled like it has up to this point, that will take some work. And this would essentially require nesting another npm project within hs_core. That has implications for our entire build/deploy process and I think we would have to figure out things like staticfiles etc. What do you think Mauriel? I'm inclined to keep the progress that @pkdash made on My Resources queries and the work that I've done on the filters rather than moving to Vue+haystack but open to opinions

3 I would not vote for because it would be a fair amount of work that would increase our dependence on DataTables.js. If we want to invest this amount of work, I think we should do so creating continuity between Discover and My Resources.

devincowan avatar Aug 01 '22 17:08 devincowan

I agree with @devincowan's evaluation. Unless we index all resources including those that are private, I am not sure how we could make my resources page load like discover page.

pkdash avatar Aug 01 '22 21:08 pkdash

Many thanks to Pabitra for helping me with some of these queries! I have a PR ready https://github.com/hydroshare/hydroshare/pull/4717 and plan to stage it somewhere with enough data that it can be adequately tested. I think users who have lots of resources shared with them, will be mighty happy with the load time!

devincowan avatar Aug 04 '22 20:08 devincowan

@pkdash has helped extensively reviewing my PR and deploying to dev-2. @horsburgh and @Castronova please feel free to test at https://cuahsi-dev-2.hydroshare.org/my-resources and let us know what you think! In particular, be sure to try different filter combinations and note filter is now stored in the URL so you can bookmark a desired configuration

devincowan avatar Aug 15 '22 19:08 devincowan

@horsburgh @Castronova sounds like we might have to deploy another PR for some other testing this afternoon. So you can delay your My Resources testing until tomorrow thanks

devincowan avatar Aug 15 '22 19:08 devincowan

@horsburgh and @Castronova not sure if you had a chance to test the updated My Resources Page. @pkdash will need to use the dev-2 machine for another deploy tomorrow afternoon, so that is the deadline for testing ;)

devincowan avatar Aug 18 '22 18:08 devincowan

@devincowan I just tested and below are my results:

url page load time
cuahsi-dev-2.hydroshare.org my-resources ~2 sec
hydroshare.org my-resources ~9 sec

Enabling and disabling the filters for the first time appears to be slower than www but this is probably because these data need to be queried.

This is great work and a real improvement to the my-resources page.

Castronova avatar Aug 18 '22 20:08 Castronova

@Castronova Just curious how many resources you see on initial page load? I think 99% of users will see my-resources page load in < 2 sec.

pkdash avatar Aug 18 '22 21:08 pkdash

I probably have more content on my my-resources page than most users:

image

Castronova avatar Aug 18 '22 22:08 Castronova

thanks for taking a look @Castronova

devincowan avatar Aug 18 '22 23:08 devincowan

Thanks guys - I got to this too late to make a difference testing, but it looks like you have it covered. I will look forward to the My Resources page being faster!!!!!

horsburgh avatar Aug 19 '22 15:08 horsburgh