conrad icon indicating copy to clipboard operation
conrad copied to clipboard

[WIP] Add support for tulu.la event aggregator

Open m0sth8 opened this issue 6 years ago • 7 comments
trafficstars

I added --source argument to refresh command that allows user to choose between different sources.

conrad refresh --source=tulula imports 770 new events from tulu.la.

fix #28

m0sth8 avatar Nov 01 '19 02:11 m0sth8

@m0sth8 Thanks for the PR!

The updation and consumption of data/events.json are decoupled by design. The plan is to run every scraper in the scrapers folder daily, remove duplicates, update the data/events.json and raise a bot pull request. This will also help ensure that all these changes are version controlled. I'll work on this flow this weekend.

conrad refresh should give the user all events in data/events.json, which they can then filter by tag, name, source etc.

At the same time, I think we should add a list of all sources at the top of the README for attribution.

Can you give me push access to this branch on your fork? I will make the required changes after setting up that flow, update this PR and merge it.

Thanks for building http://tulu.la!

vinayak-mehta avatar Nov 01 '19 12:11 vinayak-mehta

Oh, I see!

Unfortunately, we don't want data from Tulu.la to be decoupled from the service itself and stored outside because of many reasons, including rights on content, that we can't guarantee to allow permission for third party (Tulu.la is partly user generated content and partly curated content)

What we can do is to make and support a fork for those who wants to sync with both conrad/events.json and tulu.la.

Thank you for your project.

m0sth8 avatar Nov 01 '19 20:11 m0sth8

Unfortunately, we don't want data from Tulu.la to be decoupled from the service itself and stored outside because of many reasons

@m0sth8 I see, but with the current crawler implementation, the data will be stored outside of tulu.la in a user's system.

I also went through the tulu.la terms of service which states that "You may crawl the forum and site to index it for a publicly available search engine, if you run one." - https://tulu.la/policy/terms/

If possible, would love to add a tulu.la crawler to conrad. It'll run once/twice a week. More details here: https://conference-radar.readthedocs.io/en/latest/dev/adding-crawlers.html

And if not, I could prioritize https://github.com/vinayak-mehta/conrad/issues/3, which would be kinda like a git remote add letting the user configure a new source, so that they can consume events from it.

Would love to know your thoughts on this. :)

vinayak-mehta avatar Nov 03 '19 19:11 vinayak-mehta

Hi @vinayak-mehta,

It's fine to store data in user's system like a cache in browser. We don't want to store data in third party systems, e.g events.json on github.

I'm not sure that I understand #3 . How is it going to work?

m0sth8 avatar Nov 04 '19 17:11 m0sth8

Similar to this, but a different interface than conrad refresh. The data will not be stored on GitHub and directly go to the user's computer. Can you give me push access to this branch on your fork? I'll make updates to this PR sometime this/next week.

vinayak-mehta avatar Nov 05 '19 18:11 vinayak-mehta

@vinayak-mehta sounds good! I think you should have access to the pr as a maintainer of conrad.

At least github says: If checked, users with write access to vinayak-mehta/conrad can add new commits to your 28 branch. You can always change this setting later.

m0sth8 avatar Nov 06 '19 16:11 m0sth8

Oh, I'll check it out.

vinayak-mehta avatar Nov 06 '19 17:11 vinayak-mehta