db-pluto
db-pluto copied to clipboard
Refactor scratch pad
This issue will be used as a scratch pad to document potential refactors
Terminologies:
bbl(the actual bbl)billing_bbl(the actual bbl for non-condo lots, 75 condo bbl for condo lots)prime_bbl(the same as billing bbl, this is the intended join key for all DOF related tables)
Note that the
bblfield in pluto is actually theprime_bbl, which is a combo ofbblandbilling_bblfor condo lots
cama_bsmtcode.sql
replacing create_cama_primebbl.sql and cama_bsmttype.sql
SELECT
DISTINCT
x.prime_bbl,
COALESCE(b.bsmtcode, '5') as bsmtcode
FROM (
SELECT
COALESCE(dof_condo.condo_bill, pluto_input_cama_dof.bbl) AS prime_bbl,
bsmnt_type,
bsmntgradient,
ROW_NUMBER() OVER (
PARTITION BY LEFT(bbl,10)
ORDER BY bsmnt_type DESC, bsmntgradient DESC
) AS row_number
FROM pluto_input_cama_dof LEFT JOIN dof_condo
ON pluto_input_cama_dof.bbl = dof_condo.condo_base
WHERE bsmnt_type <> '0' AND bldgnum = '1'
) x
LEFT JOIN pluto_input_bsmtcode b
ON x.bsmnt_type = b.bsmnt_type
AND x.bsmntgradient = b.bsmntgradient
WHERE x.row_number = 1;
Tables that has BBL or BillingBBL or PrimeBBL
- dcp_colp
- bbl
- mapbbl
- dof_condo
- condo_base
- condo_bill
- dof_dtm
- bbl
dof_dtm contains all the physical bbls and doesn't have billing bbls
select bbl from dof_dtm
where right(bbl, 4) like '75%'
-- 0 record
dof_condocondo_base represent the base (physical bbl), and is a subset of all bbls in dof_dtmdof_condorepresent the comprehensive lookup between base (physical bbl) and condo bbl (billing bbl)
SELECT bbl from dof_dtm WHERE bbl not in (select condo_base from dof_condo)
- pluto_input_cama_dof
- bbl
-- the longest bbl is 13, so we would have to do left(bbl, 10)
select max(length(bbl)) from pluto_input_cama_dof
- pluto_input_condolot_descriptiveattributes
- parid
- pluto_input_geocodes
- billingbbl
- bbl
- pluto_input_numbldgs
- bbl
- pluto_pts
- parid
- boro, block, lot
not that for pluto_pts, only the first 10 digit of parid is bbl
--- boro||block||lot = LEFT(parid, 10)
select boro||block||lot as bbl, parid
from pluto_pts
where boro||block||lot <> LEFT(parid, 10)
pluto_pts contains condo bbls (billing bbl)
select left(parid, 10) from pluto_pts
where right(left(parid, 10), 4) like '75%'
condo bbl, which is a type of billing bbl, is characterized by e.g.
RIGHT(bbl, 4) LIKE '75%', note that the base (physical) bbl would never have '75%' as lot number.
SELECT
LEFT(parid, 10),
dof_condo.condo_base,
dof_condo.condo_bill
FROM pluto_pts
JOIN dof_condo ON LEFT(parid, 10) = dof_condo.condo_base
WHERE RIGHT(LEFT(parid, 10), 4) LIKE '75%'
-- 0 records returned
Comparing PTS vs DTM
SELECT bbl, left(parid, 10)
FROM dof_dtm FULL OUTER JOIN pluto_pts
ON bbl = left(parid, 10)
WHERE bbl is null or left(parid, 10) is null;
There are bbls in DTM not in PTS and bbls in PTS but not in DTM (most of them have lot number starting with 1), so the final list of bbls should be a combination of the two
Comparing CAMA vs PTS
select LEFT(bbl, 10) as cama_bbl, left(parid, 10) as pts_bbl
from pluto_input_cama_dof FULL OUTER JOIN pluto_pts
ON LEFT(bbl, 10) = left(parid, 10)
WHERE LEFT(bbl, 10) is null or left(parid, 10) is null;
It seems like CAMA mostly align with PTS, however, there were two bbls not in PTS, and PTS has condo bbls not in CAMA
select LEFT(bbl, 10) from pluto_input_cama_dof
where right(LEFT(bbl, 10), 4) like '75%';
-- 0 records
CAMA only has actual base (physical) bbls not billing bbls
We have very few condo lots that have a record in pluto_input_cama_dof
157 condo lots have a record in CAMA, and there are 10349 condo lots in total
SELECT LEFT(bbl, 10) AS cama_bbl
from pluto_input_cama_dof
WHERE LEFT(bbl, 10) in (SELECT condo_base from dof_condo);
For condo unit records in PTS
e.g.
base_bbl = 4015070001
billing_bbl = 4015077502
unit_bbl = 4015071101 ~ 4015071140

the values would have to come from aggregation of condo units
but land area and gross sqft would come from the main record?
Issues that I noticed:
There are 3 kinds of BBLs
bblthe physical lot bbl, it's the same as billing bbl for non-condos- for condos, we call it
condo_base_bbl
- for condos, we call it
condo_billing_bblthe imaginary billing bbl for condos (think of condos as lots), they nest within acondo_base_bblunit_bbleach unit in a condo has a unit_bbl that nest on top of acondo_base_bbl
the same condo_base_bbl can correspond to multiple condo_billing_bbl
select * from dof_condo where condo_base = '4076210001'
the dof_condo table also has duplicated entries
in dof_condo there are new condo buildings that don't have a condo_billing_bbl assigned yet. but if they do, they will always follow the 75 convention
SELECT * FROM dof_condo WHERE RIGHT(condo_bill, 4) NOT like '75%' or condo_bill is null;
In this case, these lots are just treated as normal lots, instead of condo lots
SELECT * FROM pluto_pts WHERE left(parid, 10) in (SELECT condo_base from dof_condo where condo_bill is null);
SELECT
pts.prime_bbl,
coalesce(cama.residarea, pts.residarea) as residarea,
coalesce(cama.officearea, pts.officearea) as officearea,
coalesce(cama.retailarea, pts.retailarea) as retailarea,
coalesce(cama.garagearea, pts.garagearea) as garagearea,
coalesce(cama.storagearea, pts.storagearea) as storagearea,
coalesce(cama.factoryarea, pts.factoryarea) as factoryarea,
coalesce(cama.otherarea, pts.otherarea) as otherarea,
coalesce(cama.commercialarea, pts.commercialarea) as commercialarea
FROM (
SELECT
coalesce(
coalesce(dof_condo.condo_bill,dof_condo.condo_base),
LEFT(parid,10)
) as prime_bbl,
NULLIF(SUM(
office_area_gross::numeric+
retail_area_gross::numeric+
garage_area::numeric+
storage_area_gross::numeric+
factory_area_gross::numeric+
other_area_gross::numeric
), 0) as commercialarea,
NULLIF(SUM(residential_area_gross::numeric), 0) as residarea,
NULLIF(SUM(office_area_gross::numeric), 0) as officearea,
NULLIF(SUM(retail_area_gross::numeric), 0) as retailarea,
NULLIF(SUM(garage_area::numeric), 0) as garagearea,
NULLIF(SUM(storage_area_gross::numeric), 0) as storagearea,
NULLIF(SUM(factory_area_gross::numeric), 0) as factoryarea,
NULLIF(SUM(other_area_gross::numeric), 0) as otherarea
FROM pluto_pts LEFT JOIN dof_condo
ON appt_boro||appt_block||appt_lot = dof_condo.condo_base
WHERE RIGHT(LEFT(parid,10), 2) not like '75'
GROUP BY prime_bbl
) pts LEFT JOIN (
SELECT
COALESCE(
coalesce(dof_condo.condo_bill,dof_condo.condo_base),
cama.bbl
) as prime_bbl,
NULLIF(SUM(commercialarea::numeric), 0) as commercialarea,
NULLIF(SUM(residarea::numeric), 0) as residarea,
NULLIF(SUM(officearea::numeric), 0) as officearea,
NULLIF(SUM(retailarea::numeric), 0) as retailarea,
NULLIF(SUM(garagearea::numeric), 0) as garagearea,
NULLIF(SUM(storagearea::numeric), 0) as storagearea,
NULLIF(SUM(factoryarea::numeric), 0) as factoryarea,
NULLIF(SUM(otherarea::numeric), 0) as otherarea
FROM (
SELECT * FROM pluto_input_cama_dof
WHERE bldgnum = '1' AND LENGTH(bbl) = 10
) cama LEFT JOIN dof_condo
ON cama.bbl = dof_condo.condo_base
GROUP BY prime_bbl
) cama on pts.prime_bbl = cama.prime_bbl;

but land area and gross sqft would come from the main record?
