browsertrix
browsertrix copied to clipboard
Switch running crawl updates to use websocket
When a crawl is running, the watch view are updated via websockets (directly from crawlers), while the rest of the updates, especially the crawl queue, requires polling. It would probably be more efficient to also update crawl state via websocket connection. This could be used to:
- update crawl queue
- check exclusion regex
- add or remove exclusion regex
- other crawl state updates (pages, etc...)
Updating the crawl queue would be most effective since this is currently polled regularly.
@ikreymer is this still relevant?
I think this would still be a good optimization, will think about how high of priority it is... It'll reduce the amount of polling done and improve performance. It shouldn't be too hard to implement on the backend at least.