web3.storage icon indicating copy to clipboard operation
web3.storage copied to clipboard

PSA filtering by status is not exclusive

Open flea89 opened this issue 2 years ago • 5 comments

Bug description

If for a specific CID, 1 node reports Pinned and one Pinning, the same psa pin request shows up in both:

GET /?status=pinned GET /?status=pinning

Context

Currently, here's the logic we use to convert from Cluster to Psa status does not match the way we filter by those statuses.

We're currently using query.in('content.pins.status', opts.statuses) to filter psa requests, but given we different nodes might report different statuses, the same requests might be filtered in different statuses not exclusively.

flea89 avatar Mar 16 '22 18:03 flea89

@mbommerez this doesn't fit cleanly in a PI but it's probably a quick fix, so would be good to include in this sprint!

dchoi27 avatar Apr 04 '22 17:04 dchoi27

Adding some notes from the investigation:

When we pin a cid, we get back a pin request containing among others a pin[Object] which contains pins and their status. The pin request object also contains an overall status (i.e. pinning), however the pins might have various statuses.

Take for example: request id ab62cf3c-c98d-494b-a756-b3a3fb6ddcab has two pins – one with status pinning, other with status pinned.

tables displaying the example above Note that the second table is the pin array

When we query the db in listPsaPinRequests(), we receive an array with one or multiple statuses. Per the example, if we filter by status=pinning we’ll get back the request above and when we filter again by status=pinned, we get the same one back. This is inconsistent and confusing on the user end, as to them the same pin request will appear as pinned when filtered by pinned and as pinning when filtered by pinning. This may also be true for other statuses such as queued and failed.

NB: The current convention is to always return pinned when at least one node is pinned, ELSE pinning when at least one node is pinning, ELSE queued when at least one node is queued, ELSE failed when at least one node is failed

The issue lies in the way we apply the statuses filter:

query = query.in('content.pins.status', opts.statuses)

Taking the example above, when opts.statuses = [‘pinning’], the query will “match” the request id because at least one pin has status of pinning and keep only the pin objects that match:

tables displaying the example above

We need to update the way we filter by statuses to not only look for a status, but also make sure the other statuses are not present so that when a user filters by a single status, a pin request will be returned for the appropriate status only.

alexandrastoica avatar Apr 08 '22 14:04 alexandrastoica

So far I tried:

  • Computing the status with getEffectivePinStatus and filtering with JS rather than on the DB. This would make counting rows not accurate.

  • Moving the filtering to a stored function and replicating the logic of getEffectivePinStatus. I've had no success trying to join the tables and compute the effective pin status by filtering on an aggregate.

The third option is to add a pre-computed field:

  • Add a psa_status column to psa_pin_request table (enum with following possible values queued pinning pinned failed)
  • I'm not sure about psa_status default value; I’m leaning on queued.
  • Update db client’s listPsaPinRequests to use psa_status for filtering by status.
  • Update updatePinStatuses CRON job to update psa_status for psa_pin_request rows that share the same cid.

I'm not 100% sure if we can computer psa_pin after db.upsertPins in updatePinStatuses. Because the CRON job works by batch, we can't know for sure if we are okay to calculate psa_pin. The other option would be to add a new CRON job executed after every updatePinStatuses job with the only purpose of updating psa_pin. @alanshaw would love your opinion on this.

francois-potato avatar Apr 29 '22 12:04 francois-potato

Can you do something like:

-- pinning
SELECT *
  FROM pin
 WHERE content_cid NOT IN (SELECT content_cid FROM pin WHERE status = 'Pinned')
   AND status = 'Pinning'

-- failed
SELECT *
  FROM pin
 WHERE content_cid NOT IN (SELECT content_cid FROM pin WHERE status = 'Pinned' OR status = 'Pinning')
   AND status = 'PinError'

Obviously you're selecting from a different table and joining etc. etc. but the idea of selecting the "higher" statuses and removing them from the results.

...or could you have a view on the psa_pins table with an "effective status" column?

alanshaw avatar Apr 29 '22 13:04 alanshaw

Something along these lines might also be a good alternative?

(example here is not doing the filtering but rather proving we can aggregate by effective status effectively and using a simplified table structure just for the sake of this example),

select 
  sq.id, 
  case
    when min(sq.statusNumber) = 1 then 'pinned'
    when min(sq.statusNumber) = 2 then 'pinning'
    when min(sq.statusNumber) = 3 then 'queued'
    when min(sq.statusNumber) = 4 then 'queued'
   end as psaStatus
from(
  select
    a.id, b.status,
    case
      when b.status = 'Pinned' then 1 -- psa pinned
      when b.status = 'Pinning' then 2 -- psa pinning
      when b.status = 'PinQueued' then 3 -- psa queued
      when b.status = 'Unpinned' then 3 -- psa queued
      else 4 -- psa failed
    end as statusNumber
  from a LEFT JOIN b ON a.id=b.id
) sq group by sq.id, min(sq.statusNumber)

see example here

@francois-potato, I kind of taken what you show me, is this something that you already tried?

flea89 avatar Apr 29 '22 14:04 flea89