web3.storage
web3.storage copied to clipboard
PSA filtering by status is not exclusive
Bug description
If for a specific CID, 1 node reports Pinned
and one Pinning
, the same psa pin request shows up in both:
GET /?status=pinned GET /?status=pinning
Context
Currently, here's the logic we use to convert from Cluster to Psa status does not match the way we filter by those statuses.
We're currently using query.in('content.pins.status', opts.statuses)
to filter psa requests, but given we different nodes might report different statuses, the same requests might be filtered in different statuses not exclusively.
@mbommerez this doesn't fit cleanly in a PI but it's probably a quick fix, so would be good to include in this sprint!
Adding some notes from the investigation:
When we pin a cid, we get back a pin request containing among others a pin[Object]
which contains pins and their status. The pin request object also contains an overall status (i.e. pinning
), however the pins might have various statuses.
Take for example: request id ab62cf3c-c98d-494b-a756-b3a3fb6ddcab
has two pins – one with status pinning
, other with status pinned
.
Note that the second table is the
pin
array
When we query the db in listPsaPinRequests()
, we receive an array with one or multiple statuses. Per the example, if we filter by status=pinning
we’ll get back the request above and when we filter again by status=pinned
, we get the same one back. This is inconsistent and confusing on the user end, as to them the same pin request will appear as pinned
when filtered by pinned
and as pinning
when filtered by pinning
. This may also be true for other statuses such as queued
and failed
.
NB: The current convention is to always return
pinned
when at least one node is pinned
, ELSE
pinning
when at least one node is pinning
, ELSE
queued
when at least one node is queued
, ELSE
failed
when at least one node is failed
The issue lies in the way we apply the statuses filter:
query = query.in('content.pins.status', opts.statuses)
Taking the example above, when opts.statuses = [‘pinning’]
, the query will “match” the request id because at least one pin has status of pinning
and keep only the pin objects that match:
We need to update the way we filter by statuses to not only look for a status, but also make sure the other statuses are not present so that when a user filters by a single status, a pin request will be returned for the appropriate status only.
So far I tried:
-
Computing the status with
getEffectivePinStatus
and filtering with JS rather than on the DB. This would make counting rows not accurate. -
Moving the filtering to a stored function and replicating the logic of
getEffectivePinStatus
. I've had no success trying to join the tables and compute the effective pin status by filtering on an aggregate.
The third option is to add a pre-computed field:
- Add a
psa_status
column topsa_pin_request
table (enum with following possible valuesqueued
pinning
pinned
failed
) - I'm not sure about
psa_status
default value; I’m leaning onqueued
. - Update db client’s
listPsaPinRequests
to usepsa_status
for filtering by status. - Update
updatePinStatuses
CRON job to updatepsa_status
forpsa_pin_request
rows that share the same cid.
I'm not 100% sure if we can computer psa_pin
after db.upsertPins
in updatePinStatuses
. Because the CRON job works by batch, we can't know for sure if we are okay to calculate psa_pin
. The other option would be to add a new CRON job executed after every updatePinStatuses
job with the only purpose of updating psa_pin
. @alanshaw would love your opinion on this.
Can you do something like:
-- pinning
SELECT *
FROM pin
WHERE content_cid NOT IN (SELECT content_cid FROM pin WHERE status = 'Pinned')
AND status = 'Pinning'
-- failed
SELECT *
FROM pin
WHERE content_cid NOT IN (SELECT content_cid FROM pin WHERE status = 'Pinned' OR status = 'Pinning')
AND status = 'PinError'
Obviously you're selecting from a different table and joining etc. etc. but the idea of selecting the "higher" statuses and removing them from the results.
...or could you have a view on the psa_pins
table with an "effective status" column?
Something along these lines might also be a good alternative?
(example here is not doing the filtering but rather proving we can aggregate by effective status effectively and using a simplified table structure just for the sake of this example),
select
sq.id,
case
when min(sq.statusNumber) = 1 then 'pinned'
when min(sq.statusNumber) = 2 then 'pinning'
when min(sq.statusNumber) = 3 then 'queued'
when min(sq.statusNumber) = 4 then 'queued'
end as psaStatus
from(
select
a.id, b.status,
case
when b.status = 'Pinned' then 1 -- psa pinned
when b.status = 'Pinning' then 2 -- psa pinning
when b.status = 'PinQueued' then 3 -- psa queued
when b.status = 'Unpinned' then 3 -- psa queued
else 4 -- psa failed
end as statusNumber
from a LEFT JOIN b ON a.id=b.id
) sq group by sq.id, min(sq.statusNumber)
see example here
@francois-potato, I kind of taken what you show me, is this something that you already tried?