Waldo Jaquith
Waldo Jaquith
As of August 1, the SCC is publishing CSV instead of their gnarly fixed-width format. As of October 31, they will no longer publish the old format. So that provides...
Further to #114, modify Elasticsearch to use have the index type not be a number, but instead the name of the data file.
Now that the source data is CSV, we have actual column names. Modify every table map: - Change `name` to the new column name - Remove the `start` field -...
Right now, we're getting the data from `http://www.scc.virginia.gov/clk/data/CISbemon.CSV.zip`, but the server supports TLS. Requesting the data via HTTPS yields this error: ``` urllib2.URLError: ``` Hence the present use of HTTPS....
`Corp.csv`, `LLC.csv`, `Name.History.csv`, and `Officer.csv` all claim to be UTF-8, but contain invalid characters. (They're all people's names, and I guarantee you that well over 90% of those people are...
2/3 of the entries in `LP.csv` have a `Duration` of `9999-99-99`. Convert these to `NULL`.
Why _these_ categories? Are some always exempt from local taxation?
At present, the addresses table is retrieved from S3. But, if that doesn't exist, Crump has no capacity to create that table anew. Add the functionality to create that table,...
This just takes too long to run. It's not a problem (except when debugging), but it just ain't right. Figure out how to speed this up. It shouldn't be hard.
I'm not sure if this is even possible, but it's well worth trying.