prime-reportstream icon indicating copy to clipboard operation
prime-reportstream copied to clipboard

CA - Last Mile Failures

Open Jcavallo7 opened this issue 1 year ago • 3 comments

Problem statement

Hello, there were a few last mile failures for California. Please see screenshot for details.

What you need to know

image.png

Acceptance criteria

To do

  • Resubmit files

Jcavallo7 avatar Aug 13 '24 14:08 Jcavallo7

Hey @victor-chaparro, may you please take a look to see what is causing the failures?

MikaelahD13 avatar Aug 14 '24 13:08 MikaelahD13

update

More last mile failures from CA have appeared

image.png

Jcavallo7 avatar Aug 15 '24 17:08 Jcavallo7

These errors are showing up because the files we send are being flagged as duplicates. We recently upgraded CA to use REST instead of SFTP and we were sending duplicate files from both receivers. I've turned off their SFTP receiver and should no longer be seeing this errors. There's no need to resend the files since they are duplicates.

victor-chaparro avatar Aug 15 '24 18:08 victor-chaparro

@victor-chaparro just for confirmation is this still the case? I see more last mile failures come through, screenshot attached below. If it is still duplicates and no action needed can we close the ticket? Please provide the receiver information for us to ignore going forward.

image.png

Jcavallo7 avatar Aug 26 '24 17:08 Jcavallo7

We've determined there's a sender in SimpleReport sending duplicate reports. I've contacted SimpleReport about the issue so they can reach out to them. Here's the thread https://nava.slack.com/archives/C0411VC78DN/p1725896603292609

victor-chaparro avatar Sep 09 '24 15:09 victor-chaparro

Leaving this open because we are still seeing duplicate errors for CA. We thought it would get fixed by adjusting their limit. Keeping this open until the duplicate issue is solved.

chris-kuryak avatar Mar 05 '25 18:03 chris-kuryak

The number of duplicates we are seeing being rejected from CA over the past week is significantly reduced (about 2-3 a day). We are still waiting for a response from Manifest regarding investigation into this issue from their side.

For the time being, we have capped our file limit at 400 items per send/file. This seems to have fixed the majority of the issue with CA experiencing a high level of duplicates.

However, in order to completely solve the issue either:

  1. Manifest needs to update the connection to allow more than 400 items in a short time period
  2. OR Reportstream needs to update OUR settings to send less frequently (once per 10 min or so)

chris-kuryak avatar Mar 17 '25 18:03 chris-kuryak

These items on this ticket were identified as duplicates and do not need to be resubmitted.

However, we will keep this ticket open until we have another conversation with CA to determine next steps.

@victor-chaparro will ping Manifest via email for an update on the issue from their side.

chris-kuryak avatar Mar 17 '25 18:03 chris-kuryak

Still no response from Manifest on this. Will continue to monitor.

chris-kuryak avatar Mar 20 '25 16:03 chris-kuryak

Still no response from Manifest on this issue. Ticket was created on the Platform board for refinement today: #17706

This will add a small timeout every time we send reports which may solve this issue regardless of Manifest's response.

chris-kuryak avatar Apr 07 '25 16:04 chris-kuryak

@Jcavallo7 We received a response from Manifest today via email. They are creating a script to help identify the problem that should be completed today. Then they would follow up with us regarding next steps.

chris-kuryak avatar Apr 10 '25 21:04 chris-kuryak

Received another response from Manifest today:

The test is still in progress. We’re currently working on a script to simulate Report Stream messages to SaPHIRE and are aiming to complete our troubleshooting by the end of the day today.

chris-kuryak avatar Apr 11 '25 19:04 chris-kuryak

Victor meeting with Manifest today to help reproduce the error

chris-kuryak avatar Apr 14 '25 16:04 chris-kuryak

ReportStream is working on #17744 which MAY help reduce the potential for this error from our perspective temporarily, however that is not a long-term solution and the root cause we think is likely is on the Manifest's side. We will continue to interface with Manifest to solve this issue and work towards a long term solution.

chris-kuryak avatar Apr 14 '25 16:04 chris-kuryak

Manifest has tried to reproduce the error without success. Victor has confirmed that RS does produce an error during testing.

Victor to follow up with Manifest to let them know RS is receiving errors and ask if they can share their script so we can determine if it may be missing something that would trigger the error.

chris-kuryak avatar Apr 21 '25 16:04 chris-kuryak

Victor continuing to work with Manifest on reproducing the error via a script.

Platform team continuing to work on #17744

chris-kuryak avatar Apr 28 '25 16:04 chris-kuryak

Platform moving forward with deploying #17744 this week Victor reaching back out to Manifest this week to work on their script

chris-kuryak avatar May 05 '25 16:05 chris-kuryak