RSocrata icon indicating copy to clipboard operation
RSocrata copied to clipboard

Source and Rsocrata column names and order are different

Open finestjava opened this issue 5 years ago • 4 comments

For years I have been using The Chicago Police Department "Crimes 2001 to Present" data set by direct tsv for excel downloads. 'https://data.cityofchicago.org/resource/6zsd-86xi.csv'

Just started using RSocrata for access. Finding that the output is completely different. adding completely new columns and changing capitalization on others.

What's up with this.

Thanks for some insight.

finestjava avatar Mar 01 '19 22:03 finestjava

Thanks for trying this out and hope it's been helpful.

Would you be able to copy/paste the RSocrata command that you're using and highlight a couple columns, in particular, that you noticed for a difference?

On Fri, Mar 1, 2019, 4:56 PM Harry Osoff [email protected] wrote:

For years I have been using The Chicago Police Department "Crimes 2001 to Present" data set by direct tsv for excel downloads.

Just started using RSocrata for access. Finding that the output is completely different. adding completely new columns and changing capitalization on others.

What's up with this.

Thanks for some insight.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Chicago/RSocrata/issues/163, or mute the thread https://github.com/notifications/unsubscribe-auth/ABkC0QK0khMng5ql0TsNKvQpeDq4v3T4ks5vSbAcgaJpZM4bZ9hf .

--

Tom Schenk Jr. [email protected] tomschenkjr.net

tomschenkjr avatar Mar 01 '19 23:03 tomschenkjr

Here's one example for food inspections.

CSV query: https://data.cityofchicago.org/resource/4ijn-s7e5.csv?$where=inspection_date>'2018-12-31T00:00:00
CSV columns: 
[1] "Inspection.ID" "DBA.Name" "AKA.Name" "License.." "Facility.Type" 
[6] "Risk" "Address" "City" "State" "Zip" 
[11] "Inspection.Date" "Inspection.Type" "Results" "Violations" "Latitude" 
[16] "Longitude" "Location" 
JSON query: https://data.cityofchicago.org/resource/4ijn-s7e5.json?$where=inspection_date>'2018-12-31T00:00:00'
JSON columns: 
[1] "zip" "address" "city" "violations" 
[5] "latitude" "inspection_date" "dba_name" "aka_name" 
[9] "inspection_id" "risk" "location.latitude" "location.needs_recoding"
[13] "location.longitude" "facility_type" "state" "inspection_type" 
[17] "results" "license_" "longitude"

The change in column order and names combined makes it difficult to compare the two outputs.

Also @levyj this is the example I mentioned

geneorama avatar Apr 05 '19 16:04 geneorama

Actually this is a duplicate of #32, although this is better worded / documented. I would prefer to keep this one open because it is more active.

geneorama avatar Apr 05 '19 16:04 geneorama

I happened to open an issue about this (with Socrata) earlier this week in the course of doing other work.

Socrata does not support column ordering in the JSON response, but we could implement it using the their views endpoint. We have talked about it, and as I recall there was hesitancy to rely on the views endpoint because it's not documented and there's no guarantee that it will always be there for every data set.

I've edited Socrata's response a bit, but it was essentially this:

If you want to set the row order to match the dataset, the views api does have that info and it's what I would recommend in this case. For example https://data.cityofchicago.org/views/4ijn-s7e5/columns would give column specific information.

If to want to confirm that we do not explicitly maintain column ordering in JSON endpoint. For example, if you look at the SODA 2.1 endpoint for the dataset, https://data.cityofchicago.org/resource/cwig-ma7x.json, you can see in this endpoint orders the columns alphabetically. This will be the future state of all dataset endpoints.

This is using my food inspections example.

geneorama avatar Apr 05 '19 16:04 geneorama