gesetze-tools icon indicating copy to clipboard operation
gesetze-tools copied to clipboard

🚀 [Feature] Separate Data from Tool

Open ulfgebhardt opened this issue 3 years ago • 3 comments

:rocket: Feature

It is common practice that scraper and data is stored separately, but here this is not the case - or at least partly.

We have a data folder containing jsons: https://github.com/bundestag/gesetze-tools/tree/master/data

But there is a repo associated with this scraper as well: https://github.com/bundestag/gesetze

It is still unclear to me how the tool produces the output stored in the gesetze repo.

Nevertheless I consider it useful to have all data separated from the tools creating them. I think it would be wise to create a new repo for the scraped data (please in English)

Design & Layout

Data in a data-repo should be stored in a data folder

image

ulfgebhardt avatar Apr 02 '21 00:04 ulfgebhardt

There should be a README.md always IMHO.

I suggest separate repositories for separate data sets (bgbl, banz, ...).

darkdragon-001 avatar Apr 02 '21 08:04 darkdragon-001

The repos should have proper naming - "banz" has no meaning at all. Event tho I say have english names "Bundesanzeiger" as Entity-name is acceptable I guess and the reader understands what the repo is about

ulfgebhardt avatar Apr 02 '21 15:04 ulfgebhardt

We may not even need a separate repo. Using a separate branches would probably already cover most of the way. Then we should have a cleaned-up version of the tools branch that omits all the data commits, so that the tools themselves are quick to clone.

mk-pmb avatar Nov 14 '23 16:11 mk-pmb