datasette-cluster-map icon indicating copy to clipboard operation
datasette-cluster-map copied to clipboard

Clustermap misinterprets SQL calculations

Open MichaelTiemannOSC opened this issue 3 years ago • 0 comments

This query, when applied to the PUDL 0.4.0 dataset, provides a somewhat expected result:

select plant_id_eia, plant_name_eia, city, county, iso_rto_code, latitude, plants_entity_eia.longitude, primary_purpose_naics_id, sector_name, sector_id, service_area, state, street_address, zip_code, timezone from plants_entity_eia where "longitude" >= 0 order by plant_id_eia limit 11

That is to say, it shows 10 cases of "dirty data" due either to deliberate zeroes used to indicate uncertain future plans, or erroneously missing minus signs putting US-based power plants in Chinese longitudes.

What I wanted to do was to re-plot these erroneous points by using a SQL query to select them and then flip the sign of the longitude result. It did not work. I simplified to just using the plants_entity_eia table name to qualify the terms of the conditional to ignore the selection and just pay attention to raw data in the table:

select plant_id_eia, plant_name_eia, city, county, iso_rto_code, latitude, plants_entity_eia.longitude, primary_purpose_naics_id, sector_name, sector_id, service_area, state, street_address, zip_code, timezone from plants_entity_eia where "plants_entity_eia.longitude" >= 0 order by plant_id_eia limit 11

It turns out that putting that term inside the quotes results in what can only be defined as undefined behavior. When I removed the quotes, I was able to get the correct results with this query:

select plant_id_eia, plant_name_eia, city, county, iso_rto_code, latitude, -1*plants_entity_eia.longitude as longitude, primary_purpose_naics_id, sector_name, sector_id, service_area, state, street_address, zip_code, timezone from plants_entity_eia where plants_entity_eia.longitude >= 0 order by plants_entity_eia.longitude limit 11

MichaelTiemannOSC avatar Sep 05 '21 18:09 MichaelTiemannOSC