commerce icon indicating copy to clipboard operation
commerce copied to clipboard

Item adjustments all being duplicated against the order when loading stripe payment intents

Open matt-adigital opened this issue 1 year ago • 33 comments

Description

Line item adjustments are all being duplicated, this only happens in the final step when the payment part is loaded, this was never a problem until upgrading from v3 to v4, payment is taken at the correct amount for the original value but the order is then recorded as a partial payment with these extra adjustments remaining in the order details when completed. If you load any other page on the front end then the duplicates are removed, so we added some js to do this for us on the payment page, this then meant we hit a php error and the order was not completed. Payment taken successfully but completion of order failed, screenshot attached. We've had to revert back to v3 due to this issue as we can't have all orders showing as partial payments.

Steps to reproduce

  1. Load payment page on site using stripe gateway selected
  2. Check CMS active order details, line adjustments are all duplicated
  3. Compete order for original amount and it is successful but records as a partial payment

\\\\-- OR --

  1. Load payment page on site using stripe gateway selected
  2. Check CMS active order details, line adjustments are all duplicated
  3. Load any other frontend page
  4. Check CMS active order details, line adjustments are no longer duplicated
  5. Complete payment and encounter php error message attached below

Additional info

  • Craft CMS version: 4.6.0
  • Craft Commerce version: 4.4.1.1
  • Stripe for Craft Commerce version: 4.1.1
  • PHP version: 8.0

E8140A9F-DBA7-47C7-86C1-8586C7BF0341_1_201_a

matt-adigital avatar Jan 30 '24 11:01 matt-adigital

@matt-adigital Could you please let us know if you are using the paymentFormHtml() for the payment form?

I can't seem to reproduce this issue.

Also are you using any special mutex driver like redis?

lukeholder avatar Mar 06 '24 13:03 lukeholder

Hi @lukeholder thanks for responding.

We upgraded our stripe api version within the stripe account and yes we were using the following code for the payment gateway:

{% namespace cart.gateway.handle|commercePaymentFormNamespace %}
	{{ cart.gateway.getPaymentFormHtml(params)|raw }}
{% endnamespace %}

This site is hosted on servd so it is using redis but we've not defined redis as a mutex driver within the app.php config file.

matt-adigital avatar Mar 06 '24 13:03 matt-adigital

@matt-adigital

It turns out that default Redis doesn’t make a reliable globally distributed locking mechanism like we thought for ages without jumping through some hoops. https://redis.io/docs/manual/patterns/distributed-locks/

So much so that in Craft 4.6.0, we swapped the default mutex in Craft back to a new and improved database mutex.

It can manifest itself with race conditions and duplicated data like what you’re seeing.

Servd is currently using Redis, but we’ve been talking to Matt (the owner), and he’s got a plan for a fix he’s working on. You’ll need to reach out to him for timelines and specifics, but it looks like you’re already on 4.6.0, so at least you’ve got that part covered already.

angrybrad avatar Mar 06 '24 18:03 angrybrad

Hi @angrybrad thank you for that explanation, all makes sense! We run monthly updates on this site and try to keep everything up to date where possible. We will hold off on the stripe upgrade for now then and start a dialogue with Matt at Servd for an indication of timeline etc. Cheers!

matt-adigital avatar Mar 07 '24 09:03 matt-adigital

Hi @angrybrad ,

Servd applied the fix yesterday regarding the database mutex. We've tested the stripe upgrade on our staging site (Servd) and unfortunately are still experiencing the duplicate adjustments.

I spoke to Joe at Servd this morning and he confirmed it appears to be using the database mutex. He suggested the issue may not be down to the mutex.

damonadigital avatar Mar 08 '24 14:03 damonadigital

@damonadigital an you reliably reproduce it with steps you can share?

lukeholder avatar Mar 08 '24 16:03 lukeholder

@lukeholder yes

  1. Load payment page on site using stripe gateway selected
  2. Check CMS active order details, line adjustments are all duplicated
  3. Compete order for original amount and it is successful but records as a partial payment

-- OR --

  1. Load payment page on site using stripe gateway selected
  2. Check CMS active order details, line adjustments are all duplicated
  3. Load any other frontend page
  4. Check CMS active order details, line adjustments are no longer duplicated

I am no longer seeing the php error when completing a payment after following the second set of steps.

damonadigital avatar Mar 08 '24 16:03 damonadigital

@lukeholder I'm also experiencing this issue quite severely.

It was happening in Craft + Commerce + Stripe 3 just occasionally so @angrybrad suggested removing Redis and upgrading to C4 to get the improved database mutex. Now it's happening to roughly one in 4 orders.

I can reproduce it like @damonadigital can but not every time.

Here's an example I just made, with the same cart, front and back end, both pages refreshed before taking the screenshot. Once the cart is updated, the additional adjustments will be removed. In this case I just have one duplication but usually there are two.

checkout

pixelmachine avatar Mar 19 '24 16:03 pixelmachine

HI @pixelmachine & @damonadigital

I have just tried to reproduce this (using the steps stated) with the latest Craft, Commerce and Stripe using the example templates and no matter how much I refresh the payment page I am never seeing duplicated adjustments.

I go back and forth to the payment page, change what is in my cart etc and still only see the expected number of adjustments.

CleanShot 2024-03-19 at 16 19 00@2x

Would either of you be willing to create a fresh install with one product in and set up shipping/taxes in the same way you have it in your project. Then if you can replicate with the example templates, send us the database we might be able to figure out what is happening.

We are obviously keen to get to the bottom of this issue so we can get your projects running smoothly without any problems.

Thank you for all the information, time and feedback you have already put into this.

nfourtythree avatar Mar 19 '24 16:03 nfourtythree

Hey @nfourtythree seems like the issue is somehow related to the hosting environment, did you try it on Servd?

My live setup is using Linode instances provisioned by Ploi, with a load balancer. Bit of a hassle to setup so testing on Servd seems simpler.

The issue does not occur in my local dev and test environments (DDEV and Fortrabbit).

If you can't replicate it in a similar environment then I'll see about getting you an install. Also I've already discussed this a bit more with Brad via support if you want to check that.

pixelmachine avatar Mar 19 '24 18:03 pixelmachine

Hi All

I have managed to create a test case where I can reliably reproduce this issue. As expected, it is a case of a race condition, we will look to get a fix for this in place as soon as possible.

This is just a note to keep everyone updated with the situation.

Thanks!

nfourtythree avatar Mar 20 '24 08:03 nfourtythree

Hi @pixelmachine, @damonadigital & @matt-adigital

We have pushed a branch with fixes in for this issue and there is now a PR for it.

Before we merge the branch it would be great if you could give it a test on your local/staging environments to see if this fixes things for your project.

To try this fix, change your craftcms/commerce requirement in your project's composer.json to:

"require": {
  "craftcms/commerce": "dev-bugfix/race-conditions-duplicate-adjustments#1b2df66d0cc988b25d08211e2db1bad33a0a46ce as 4.5.2",
  "...": "..."
}

Then run composer update.

And then let us know how you get on.

Thanks!

nfourtythree avatar Mar 21 '24 15:03 nfourtythree

Thanks @nfourtythree testing now.

pixelmachine avatar Mar 21 '24 15:03 pixelmachine

Hi @nfourtythree ,

Thanks for the fix. Unfortunately it's not worked for us, we are still getting duplicate adjustments.

damonadigital avatar Mar 22 '24 09:03 damonadigital

Hi @damonadigital

Thank you for your response, sorry to hear this isn't working.

This feels like an environmental mutex related issue. In the branch I linked to we are now locking the saving of the order on requests to the payment controller pay action. This means (if the mutex lock is working correctly) that if a race condition of requests to that action that the saving (and therefore recalculation) of the order can only happen one at time.

Are you able to reproduce the issue with the following:

  • in a local development environment?
  • with all other custom modules and plugins disabled?
  • using the example templates?

Thanks

nfourtythree avatar Mar 22 '24 09:03 nfourtythree

Hey @nfourtythree I can also confirm that it hasn't fixed the issue for me.

I can only answer your first question at the moment: No I can't replicate the problem in my local or staging environment and never have been able to.

I'll try to test the others.

pixelmachine avatar Mar 22 '24 11:03 pixelmachine

Hi @nfourtythree so I've done a lot of testing over the last couple of days and I think I've found the culprit at least for my site – Cloudflare.

I'm using Blitz for the non-shop parts of my site and the instructions for Blitz suggest using a page rule in Cloudflare set to 'cache everything'. With a clone of my site on the exact same system I was only able to reproduce the issue if the page rule was enabled.

Other relevant things

I also setup a version of the Spoke & Chain site, again on the same system, but wasn't able to reproduce the problem. On my original site the checkout is a single page and each step is updated using a Sprig component, which in turn may refresh the Stripe payment form.

I'm not sure why this should make any difference and if I get the chance I'll modify the Spoke & Chain site to see if using Sprig to update the cart changes anything. Looking at the requests I can't see why it should but that's really the only difference I can see.

One thing that is quite interesting is that I can see the error happening in real time in the Control Panel, and it occurs almost every single time but then often self-corrects:

  1. I select a shipping method on the checkout (or any cart-update)
  2. Sprig submits it
  3. At the same time I refresh the active cart in the CP and see multiple adjustments
  4. Sprig completes and refreshes the payment form
  5. I refresh the active cart and either still see multiple adjustments or it's fixed and they're gone
  6. The front end always shows the correct information regardless of what's in the control panel

Hope that's all clear!

pixelmachine avatar Mar 26 '24 22:03 pixelmachine

It's worth mentioning that our site also uses Cloudflare and Blitz. Although we have only applied the "Cache everything" page rule to 'domain.co.uk/assets/*'

damonadigital avatar Mar 27 '24 11:03 damonadigital

@damonadigital yeah so I think it's probably still related to Cloudflare but I spoke too soon with the page rules. I could no longer replicate it yesterday but it's still happened at least once today. I may set Cloudflare to full bypass and see if that helps.

pixelmachine avatar Mar 27 '24 13:03 pixelmachine

Hi All

Thank you for your extra information.

In an attempt to keep things progression we have pushed up another fix which could solve the issue. Although it could solve it I don't believe it is the root cause.

If you have previously changed the composer requirement to point to the branch then you should be able to do a composer update.

With all the extra information provided it seems likely that the the issue here is two requests being made at the same time due to some kind of combination of Sprig, Cloudflare, Blitz etc.

But with this, it is still a little confusing as to why the mutex lock is not preventing the behaviour you are seeing from happening.

I understand everyone is super busy but it would be great if you do get a chance to try out a test controller I have put together.

<?php

namespace modules\controllers;

use craft\web\Controller;
use yii\web\Response;

class MutexController extends Controller
{
    protected array|bool|int $allowAnonymous = true;

    public function actionTest(): Response
    {
        $response = new Response();
        $content = 'Start: ' . floor(microtime(true)) . PHP_EOL;
        if (!\Craft::$app->getMutex()->acquire('test', 10)) {
            $content .= 'Failed: ' . floor(microtime(true)) . PHP_EOL;
        } else {
            $content .= 'Success: ' . floor(microtime(true)) . PHP_EOL;
            sleep(2);
        }

        $response->content = $content;
        \Craft::$app->getMutex()->release('test');

        return $response;
    }
}

With this you can then run the following command line command:

curl -s https://example.domain/actions/my-module/mutex/test & curl -s https://example.domain/actions/my-module/mutex/test; wait

You will need to replace the example.domain and my-module (depending on what you bootstrapped your custom module)

This test will output some micro times to show when the request started and succeeded. If everything is working correctly with the mutex we should see that the start times are the same (within tolerance) and the success times are 2 seconds apart. Here is an example output from my tests:

Start: 1711551857
Success: 1711551857
Start: 1711551857
Success: 1711551859

If there is no difference in success times this would suggest the mutex is not working correctly.

Please let me know how you get on it.

nfourtythree avatar Mar 27 '24 15:03 nfourtythree

Hi @nfourtythree ,

Just a quick update to let you know we've tried the latest fix and still seeing the issue. I will try the test controller when I have a chance and let you know.

Thanks

damonadigital avatar Mar 28 '24 14:03 damonadigital

Interestingly we are no longer seeing the issue with the fix in place, or at least I haven't been able to reproduce it and no live orders have encountered it.

pixelmachine avatar Mar 28 '24 14:03 pixelmachine

I've just done the mutex controller test and these are the results:

Start: 1711637547 Success: 1711637547 Start: 1711637547 Success: 1711637549

damonadigital avatar Mar 28 '24 14:03 damonadigital

Hi @damonadigital

Thank you for getting back to us with that. If the mutex is working and you have pulled the latest changes I am really stumped as to how this issue is still apparent.

Sorry to ask some simple questions:

  • Could you verify that these lines are definitely in the code that you are testing?
  • Do you have more than one payment form showing on the payment page?

Thanks

nfourtythree avatar Apr 02 '24 07:04 nfourtythree

Hi @nfourtythree ,

  • I can confirm those lines are present, so the bug fix is definitely being applied.
  • Just one payment form on the payment page.

Thanks

damonadigital avatar Apr 02 '24 14:04 damonadigital

Hey @nfourtythree

Staging:

Start: 1712100092
Success: 1712100092
Start: 1712100095
Success: 1712100095

Live:

Start: 1712100349
Success: 1712100349
Start: 1712100349
Success: 1712100351

pixelmachine avatar Apr 02 '24 23:04 pixelmachine

Hi

Just wanted to check in with how things are going?

@pixelmachine everything still working as expected?

@damonadigital still seeing issues?

Thanks

nfourtythree avatar Apr 04 '24 12:04 nfourtythree

Hi @nfourtythree ,

Yes unfortunately still seeing issues. I did another test after confirming those lines were in the vendor folder.

@pixelmachine Did you end up setting Cloudflare to full bypass in order to fix the issue?

Thanks

damonadigital avatar Apr 04 '24 12:04 damonadigital

@nfourtythree as far as I know we haven't had a single recurrence since your not-a-fix fix last week. I have my composer.json requiring "craftcms/commerce": "dev-bugfix/race-conditions-duplicate-adjustments as 4.5.2".

@damonadigital no we didn't, Nathaniel's fix last week essentially solved the problem as soon as it was applied.

We have Cloudflare proxy enabled and a pages rules set like:

*.domain.co.uk/* Cache Level: Cache Everything
*.domain.co.uk/shop/* Cache Level: Bypass

For Blitz we have the same sort of thing:

'includedUriPatterns' => [
    [
      'siteId' => 1,
      'uriPattern' => '.*',
    ],
  ],
  'excludedUriPatterns' => [
    [
      'siteId' => 1,
      'uriPattern' => '^shop',
    ],
  ],

I haven't changed any Sprig components in the checkout.

pixelmachine avatar Apr 04 '24 13:04 pixelmachine