CA data source
Write CA data source.
@zstumgoren Does every state need a separate repo for -data- and -results-? and -sources? Would a scraper go into the -data- repo?
I am interested in picking up some of the CA work. Making sure I understand your process.
Hi @rkiddy: thanks! Every state gets a -results repository at the end of the process: that's where the raw results are published. The -data repos contain results that are pre-processed before we load them into our system (usually this means converting from PDFs or other formats into CSV files). -sources repos are for results files that we get from official sources that aren't posted online. Does that help?
Yep. What @dwillis said. Also, to emphasize, you don't necessarily need to have a -data repo. Our normal pipeline is intended to handle most processing tasks, as long as the files fit the mold. The pre-processing code in -data directories are used when the available files require some extra initial wrangling to whip them into shape for the normal processing pipeline.