crawly
crawly copied to clipboard
Add lightweight UI for Crawly Management
As it was discussed here: https://github.com/oltarasenko/crawly/pull/97#issuecomment-626565242 we want to build a lightweight (probably HTTP based) UI for the single node based Crawly operations. For people who don't want (or don't need) to use more complex https://github.com/oltarasenko/crawly_ui.
As far as we see it now we need to develop a Lightweight HTTP client (alternatively we might look into command line clients like https://github.com/ndreynolds/ratatouille)
- without an external database as a dependency
- which allows to schedule/stop jobs on the given node
- which allows seeing currently running jobs [ideally with metrics like crawl-speed]
- which allows to schedule jobs with given parameters like concurrency
I don't think it'll be good to replicate efforts with a separate ui solution, as much of the functionality is already in the crawly_ui repo. I think slowly breaking up the repo would take the least effort.
For example, the large scale crawling can have :crawly, :crawly_repo (ecto interface) , :crawly_webui (phoenix interface) as deps, while having the same functionality as the current crawly_ui.
For single node crawls without dedicated db but with ui compoennts, one can have :crawly, :crawly_webui.
For flatfile crawls, one can have :crawly and :crawly_tui
@oltarasenko I think the UI is a good idea for implementing a scheduler but I think you should focus on a better reporter impl first.
After digging a bit... the room for imporvement starts with documenting this manager_operations_timeout: (5 * 60_000), in the config and do a PR to do some math because that swaps the timeout interval for jobs that have a terminus. "Paginated search results". I think you can do pretty much anything you would need with v0.13. What we do over here at Pyroclastic is just have cron call a new mix task on Pepsi's ChoreRunner service. In the end it's easy enough to test these jobs with ChoreRunnerUI which is a LiveView. We also have a telemetry hook in the spider to report these metrics...kinda hacky here on our end but it would be nice just to see %Crawly.Reporter{metric: count, ...} come out so we can pass it to Telemetry that way we get notifications for admins in our app but mainly a place to harvest crawly logs. We would really like to see this project improve there. I'm still learning some of the specs of the language but I picked this project apart pretty quickly. Below is how we currently report....
{:stored_items, items_count} = Crawly.DataStorage.stats(__MODULE__) :telemetry.execute( [:crawler, :log, :notify], %{items_count: items_count}, %{title: title} )