crawly icon indicating copy to clipboard operation
crawly copied to clipboard

Add lightweight UI for Crawly Management

Open oltarasenko opened this issue 5 years ago • 2 comments

As it was discussed here: https://github.com/oltarasenko/crawly/pull/97#issuecomment-626565242 we want to build a lightweight (probably HTTP based) UI for the single node based Crawly operations. For people who don't want (or don't need) to use more complex https://github.com/oltarasenko/crawly_ui.

As far as we see it now we need to develop a Lightweight HTTP client (alternatively we might look into command line clients like https://github.com/ndreynolds/ratatouille)

  1. without an external database as a dependency
  2. which allows to schedule/stop jobs on the given node
  3. which allows seeing currently running jobs [ideally with metrics like crawl-speed]
  4. which allows to schedule jobs with given parameters like concurrency

oltarasenko avatar May 12 '20 11:05 oltarasenko

I don't think it'll be good to replicate efforts with a separate ui solution, as much of the functionality is already in the crawly_ui repo. I think slowly breaking up the repo would take the least effort.

For example, the large scale crawling can have :crawly, :crawly_repo (ecto interface) , :crawly_webui (phoenix interface) as deps, while having the same functionality as the current crawly_ui.

For single node crawls without dedicated db but with ui compoennts, one can have :crawly, :crawly_webui.

For flatfile crawls, one can have :crawly and :crawly_tui

Ziinc avatar May 13 '20 17:05 Ziinc

@oltarasenko I think the UI is a good idea for implementing a scheduler but I think you should focus on a better reporter impl first. After digging a bit... the room for imporvement starts with documenting this manager_operations_timeout: (5 * 60_000), in the config and do a PR to do some math because that swaps the timeout interval for jobs that have a terminus. "Paginated search results". I think you can do pretty much anything you would need with v0.13. What we do over here at Pyroclastic is just have cron call a new mix task on Pepsi's ChoreRunner service. In the end it's easy enough to test these jobs with ChoreRunnerUI which is a LiveView. We also have a telemetry hook in the spider to report these metrics...kinda hacky here on our end but it would be nice just to see %Crawly.Reporter{metric: count, ...} come out so we can pass it to Telemetry that way we get notifications for admins in our app but mainly a place to harvest crawly logs. We would really like to see this project improve there. I'm still learning some of the specs of the language but I picked this project apart pretty quickly. Below is how we currently report.... {:stored_items, items_count} = Crawly.DataStorage.stats(__MODULE__) :telemetry.execute( [:crawler, :log, :notify], %{items_count: items_count}, %{title: title} )

cigzigwon avatar Feb 12 '22 20:02 cigzigwon