ipfs-webui icon indicating copy to clipboard operation
ipfs-webui copied to clipboard

Webui Hangs after adding heavy workload to ipfs cluster peers

Open trendsetter37 opened this issue 1 year ago • 12 comments

Description So I’ve come across a weird situation where my private cluster was empty, The only operations we used were just adding small test files to watch how they were distributed. During this time the webui for each private node continued to work as usual.

However, after adding a heavier load ~ 10GB of data to the private cluster using ips-cluster-ctl, the webui stopped working for each node. Good news is that the gateways still work but this seems like a weird bug. After the data was added to the private cluster the webui hangs if we attempt to navigate to the url in the browser.

To Reproduce Steps to reproduce the behavior:

  1. Install and start private ipfs nodes using swarm keys.
  2. Check webui (still works at this point)
  3. install connect nodes to ipfs-cluster service
  4. Check webui (still works)
  5. Load at least 10GB of data through the cluster
  6. Check webui (Starts to hang when attempting to access)

Expected behavior WebUI loads and is working normally.

Desktop (please complete the following information):

  • OS: Linux servers (Arch, Rasbian)
  • Browsers: Brave, Firefox, Palemoon
  • Version: Most recent

Additional context It is interesting that the problem only occurs after loading more that a trivial amount of data to the cluster. Unsure of what the lower threshold would be.

trendsetter37 avatar Jul 26 '23 17:07 trendsetter37

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

welcome[bot] avatar Jul 26 '23 17:07 welcome[bot]

Thanks for submitting this issue @trendsetter37, I am not surprised that this happened, the performance degradation could be because of multiple reasons, but before we jump to conclusions on what might've happened, I would like to learn more. Can you please:

  • Describe the shape of the data that you were benchmarking against?
    • Number of files
    • Levels, Nodes/Level, Edges
    • Type of Data
  • Confirm if a single 10gig file can cause this? or what's the minimum number of files you've experienced this issue with.
  • Share Logs and Configs which might help us understand what happens at the node when this content is added.
  • Share a dataset that makes it easy to reproduce.

We have huge datasets in the explore section, which seem to work fine, but it's most likely the file-browser that's not liking the number of nodes it needs to work with. A reproducible example can help us identify the root cause and triage this better.

Thanks!

whizzzkid avatar Aug 03 '23 07:08 whizzzkid

Let's see, probably more than 50 files all reading from mostly PDFs at this point to a few video and audio files. I believe the audio files (~11GBs) may have been when I first noticed this. I did choose the trickle layout for those sense I wanted to take advantage of the more efficient Merkel Dag layout for linearly read files.

Note that I continued to add files even after the webui stopped working so I'll go back and spin up a fresh cluster to see if I can reproduce with less variables.

Thanks for reaching out @whizzzkid !

trendsetter37 avatar Aug 03 '23 13:08 trendsetter37

@whizzzkid Can you say more about regarding: I am not surprised that this happened I can poke around the relevant code as well to look for clues. It sounds as if you already expected something like this to happen based on your knowledge of the relevant code surrounding webui/ipfs cluster stuff

trendsetter37 avatar Aug 03 '23 19:08 trendsetter37

@trendsetter37 the reason I'm not surprised is because UIs can slow down significantly as the number of DOM nodes grow, that's why I'm really interested in knowing the shape of the data that the UI is trying to render.

The easiest way to get the structure is tree -ahs you can dump it in a file tree -ahs -o /tmp/report.log. I attached the output of node_modules folder from ipfs-webui report.log but if you don't wanna share your filenames then that would be understandable.

I tested a few things:

  • tried with a large 35+gig folder with 15 files and webui did not have any issues.
  • Adding the node_modules seem to make webui unresponsive. Which sort of confirms my hunch that it's the innode count + nesting that makes it harder than the size itself.

whizzzkid avatar Aug 04 '23 01:08 whizzzkid

@trendsetter37 thanks for reporting this.

Can you confirm the following?

  1. You added content to your ipfs node via CLI (i.e. outside of webui)
  2. You attempted to load the main webui page and it hangs (127.0.0.1:5001/webui on a local Kubo node)

For #1, I want to make sure we're looking at the correct issue. For #2, if webui hangs on the initial load (regardless of page) this seems like it could be an issue exacerbated by the content on the node, or an ipfsd-ctl issue when attempting to connect to that node, and not a fully contained webui issue (i.e. some bug in our processing of large data). Though either way, we still have some work to do.

If it's hanging on accessing a particular page, then that means some operations more specific to that page might be at fault.

I'm just trying to narrow down where our focus should be.

Thanks for your continued input!

SgtPooki avatar Aug 04 '23 22:08 SgtPooki

@SgtPooki yes, all content was added via CLI and the main page is what never loads. I haven't attempted to load any secondary pages.

Also, seems like what @whizzzkid found with seeing the same behaviour after adding the node_modulels/ here pretty close to what may be happening

trendsetter37 avatar Aug 04 '23 23:08 trendsetter37

Ok thanks 🙏. I figured but wanted to confirm

SgtPooki avatar Aug 04 '23 23:08 SgtPooki

Update here: I'm also getting a 302 status when attempting to go directly to http://localhost:5001/webui. Is that expected?

trendsetter37 avatar Aug 13 '23 23:08 trendsetter37

@trendsetter37 that should not happen, is this after you added content?

whizzzkid avatar Aug 25 '23 07:08 whizzzkid

@whizzzkid Well i didn't check the status code before adding content. I can spin up a fresh install on docker to see if I get the same. However, at this time the variables are that these nodes are a part of a private cluster and have substantial data they are surving. 4 ipfs nodes collectively holding about 2.5 TB of data.

Current freespace:

12D3KooWDgX6pqihEMvq7a1TaK1VHDacsKsJWoG1JQtm6HTkwtq7 | freespace: 3.6 TB | Expires in: 23 seconds from now
12D3KooWFXXoYCfVMhpiRgWfuCw8DtuAkc1eWfRaNij2whsiboz9 | freespace: 3.0 TB | Expires in: 27 seconds from now
12D3KooWFfWM7eGQhibVzC4jqRrzBXWG8txCH5YXirG8ttqx32rF | freespace: 3.6 TB | Expires in: 18 seconds from now
12D3KooWReWQoDSB77UyKwxz2cH9NqMxMb9mFx8LUhjKU7AkNiq8 | freespace: 4.6 TB | Expires in: 20 seconds from now

And adding another node today actually.

trendsetter37 avatar Aug 26 '23 14:08 trendsetter37

If you're running a kubo node, it does a temp redirect(302) from the /webui path to the appropriate cid for the webui

SgtPooki avatar Aug 28 '23 18:08 SgtPooki