tx-lcn icon indicating copy to clipboard operation
tx-lcn copied to clipboard

5.0.1整合出现空指针异常问题

Open linweifeng opened this issue 5 years ago • 12 comments

5.0.1空指针异常问题

java.lang.NullPointerException: null
	at com.codingapi.txlcn.tc.core.checking.DefaultDTXExceptionHandler.handleNotifyGroupBusinessException(DefaultDTXExceptionHandler.java:97)
	at com.codingapi.txlcn.tc.core.template.TransactionControlTemplate.notifyGroup(TransactionControlTemplate.java:159)
	at com.codingapi.txlcn.tc.core.transaction.lcn.control.LcnStartingTransaction.postBusinessCode(LcnStartingTransaction.java:65)
	at com.codingapi.txlcn.tc.core.DTXServiceExecutor.transactionRunning(DTXServiceExecutor.java:109)
	at com.codingapi.txlcn.tc.aspect.weave.DTXLogicWeaver.runTransaction(DTXLogicWeaver.java:95)
	at com.codingapi.txlcn.tc.aspect.TransactionAspect.runWithLcnTransaction(TransactionAspect.java:93)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:629)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:618)
	at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:168)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)

该异常时不时出现,debug进去发现是由于发生了死锁问题。(不是必现)

  • org.hibernate.exception.LockTimeoutException: could not execute statement
  • java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction

麻烦作者查阅后协助排查一下,看是什么问题导致的,如果需要什么信息请回复本issue。

TM组件不兼容

在尝试不用的TM组件发现,不同版本的TC无法注册到不同的TM上,比如5.0.1版本的TC无法注册到5.0.2版本的TM上。

请问是否有平滑升级TC/TM 组件的方案?

linweifeng avatar Apr 18 '19 08:04 linweifeng

5.02 也出现这个问题。

lyings avatar May 08 '19 07:05 lyings

我也遇到了,同样5.02

quxiaoping avatar May 09 '19 09:05 quxiaoping

我也遇到了,楼主解决了吗?

mazhai avatar May 19 '19 20:05 mazhai

请问楼主解决了么,我也遇到这个问题了

happyTkod avatar May 24 '19 00:05 happyTkod

问下楼主解决了吗?

haogewang avatar May 27 '19 09:05 haogewang

5.02版本也遇到了,请问楼主解决了吗?

anewoneday2019 avatar Jun 11 '19 14:06 anewoneday2019

测试发现貌似,调试时间超过4-5s后就会出现改控制正问题

LmingXie avatar Jun 13 '19 06:06 LmingXie

统一回复一下:问题的空指针,检查一下切面(资源切面跟注解切面的配置)跟事务切面的关系,排查一下可以解决。

问题 2升级问题暂时没有解决方案

linweifeng avatar Jun 13 '19 06:06 linweifeng

image 是的,程序调试的时候将 分布式事务执行总时间(ms, 默认为36000),进行适当的增加可以避免调试时候的NullPointerException=》tx-lcn.manager.dtx-time=24000,生成环境进行量测可得出恰当的时间设定

LmingXie avatar Jun 13 '19 07:06 LmingXie

今天也遇到了这个问题,我使用的是 5.0.2 版本,采用的分布式事务是:

@LcnTransaction(propagation = DTXPropagation.REQUIRED)
@LcnTransaction(propagation = DTXPropagation.SUPPORTS)

跟踪源码发现我这边是由于分布式事务超时导致的空指针异常,问题分析及解决办法:

tx-lcn:
  manager:
    # 分布式事务超时时间(ms),需要大于[(微服务调用链长度 e) * (hystrix 超时时间) + N(多次跨服务调用))时间]
    # 否则会因为超时导致 com.codingapi.txlcn.tc.core.checking.DefaultDTXExceptionHandler.handleNotifyGroupBusinessException 的Throwable ex 参数为空,
    # 从而导致抛空指针异常,从而导致“结束事务”没执行,事务没结束,导致后续的请求一直卡在那里,即使接收的服务重启也没用
    # 异常重现步骤:
    #   1、接收服务 B ,设置 ribbon.ReadTimeout 为 10 秒,接收接口 Thread.sleep(9 * 1000) 9 秒
    #   1、启动请求服务 A,启动接收服务 B
    #   2、确保两个服务均已启动成功,且成功注册到Eureka ,且 spring 的 Gateway 网关可以正常转发请求
    #   3、服务 A 发起请求,通过 FeignClient 调用服务 B 的接口
    #   4、此时该接口的等待时间会超过 8 秒(分布式事务默认超时时间为 8 秒),从而导致分布式事务超时
    #   5、通过跟踪分布式事务超时处理源码
    #       TransactionControlTemplate.notifyGroup {
    #           if (globalContext.isDTXTimeout()) {
    #               throw new LcnBusinessException("dtx timeout.");
    #           }
    #           ...
    #           catch (LcnBusinessException e) {
    #               // 此时的 e.getCause() 是 null ,会导致 dtxExceptionHandler.handleNotifyGroupBusinessException 抛空指针异常
    #               dtxExceptionHandler.handleNotifyGroupBusinessException(Arrays.asList(groupId, state, unitId, transactionType), e.getCause());
    #           }
    #       }
    #       
    #       DefaultDTXExceptionHandler.handleNotifyGroupBusinessException {
    #           ...
    #           if ((ex.getCause() != null && ex.getCause() instanceof UserRollbackException)) // 此段代码抛空指针异常
    #           ...
    #           transactionCleanTemplate.clean(groupId, unitId, transactionType, state); // 导致本段代码无法执行,无法正常结束事务
    #       }
    #   6、此时就会出现分布式事务超时异常,从而导致分布式事务无法正常结束,然后后续的所有请求都会卡在这,一直报这个异常,即使服务 B 重启也没用,需要重启服务 A 才可以
    # 
    # 由于需要设置的(分布式事务超时时间(ms))无法直接确定确定,因此可进一步优化:
    #   1、设置(分布式事务超时时间(ms))大于等于 [1 * (hystrix 超时时间)]
    #   2、重写 DefaultDTXExceptionHandler.handleNotifyGroupBusinessException 对 ex 做空判断,确保事务能正常结束,然后抛出分布式事物超时异常,以便获知是分布式事务超时的情况
    #   3、确保重写的 DefaultDTXExceptionHandler 优于 txlcn-tc-5.0.2.RELEASE.jar 被加载
    dtx-time: 35000

重写代码如下:

/*
 * Copyright 2017-2019 CodingApi .
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package com.codingapi.txlcn.tc.core.checking;

import com.codingapi.txlcn.common.exception.*;
import com.codingapi.txlcn.logger.TxLogger;
import com.codingapi.txlcn.tc.txmsg.TMReporter;
import com.codingapi.txlcn.tc.core.template.TransactionCleanTemplate;
import com.codingapi.txlcn.txmsg.params.TxExceptionParams;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import java.util.List;

/**
 * Description:
 *      重写 handleNotifyGroupBusinessException 方法,对参数 Throwable ex 做空判断,解决分布式事务超时,引起 ex.getCause() 空指针,
 *      从而事务无法正常结束,导致后续请求一直卡住,一直抛空指针异常
 * Date: 2019/09/25
 *
 * @author shenjh
 *
 * Description: 分布式事务异常处理器类
 * Date: 2018/12/20
 *
 * @author ujued
 * @see DTXExceptionHandler
 *
 */
@Component
@Slf4j
public class DefaultDTXExceptionHandler implements DTXExceptionHandler {

    private static final TxLogger txLogger = TxLogger.newLogger(DefaultDTXExceptionHandler.class);

    private final TransactionCleanTemplate transactionCleanTemplate;

    private final TMReporter tmReporter;

    @Autowired
    public DefaultDTXExceptionHandler(TransactionCleanTemplate transactionCleanTemplate, TMReporter tmReporter) {
        this.transactionCleanTemplate = transactionCleanTemplate;
        this.tmReporter = tmReporter;
    }

    @Override
    public void handleCreateGroupBusinessException(Object params, Throwable ex) throws TransactionException {
        throw new TransactionException(ex);
    }

    @Override
    public void handleCreateGroupMessageException(Object params, Throwable ex) throws TransactionException {
        throw new TransactionException(ex);
    }

    @Override
    public void handleJoinGroupBusinessException(Object params, Throwable ex) throws TransactionException {
        List paramList = (List) params;
        String groupId = (String) paramList.get(0);
        String unitId = (String) paramList.get(1);
        String unitType = (String) paramList.get(2);
        try {
            transactionCleanTemplate.clean(groupId, unitId, unitType, 0);
        } catch (TransactionClearException e) {
            txLogger.error(groupId, unitId, "join group", "clean [{}]transaction fail.", unitType);
        }
        throw new TransactionException(ex);
    }

    @Override
    public void handleJoinGroupMessageException(Object params, Throwable ex) throws TransactionException {
        throw new TransactionException(ex);
    }

    @Override
    public void handleNotifyGroupBusinessException(Object params, Throwable ex) {
        List paramList = (List) params;
        String groupId = (String) paramList.get(0);
        int state = (int) paramList.get(1);
        String unitId = (String) paramList.get(2);
        String transactionType = (String) paramList.get(3);

        if (ex == null) { // add by shenjh 20190925 增加空判断
//            log.error("分布式事务超时! Note by shenjh");
            /*
            见:LcnConnectionProxy.RpcResponseState notify(int state)
            if (state == 1) {
                log.debug("commit transaction type[lcn] proxy connection:{}.", this);
                connection.commit();
            } else {
                log.debug("rollback transaction type[lcn] proxy connection:{}.", this);
                connection.rollback();
            }
             */
            state = 0; // 分布式事务回滚
        } else {
            //用户强制回滚.
            if (ex instanceof UserRollbackException) {
                state = 0;
            }
            if ((ex.getCause() != null && ex.getCause() instanceof UserRollbackException)) {
                state = 0;
            }
        }

        // 结束事务
        try {
            transactionCleanTemplate.clean(groupId, unitId, transactionType, state);
        } catch (TransactionClearException e) {
            txLogger.error(groupId, unitId, "notify group", "{} > clean transaction error.", transactionType);
        }

        if (ex == null) { // add by shenjh 20190925 增加抛出分布式超时异常
            throw new TxlcnTimeoutException("分布式事务超时! Note by shenjh");
        }
    }

    @Override
    public void handleNotifyGroupMessageException(Object params, Throwable ex) {
        // 当0 时候
        List paramList = (List) params;
        String groupId = (String) paramList.get(0);
        int state = (int) paramList.get(1);
        if (state == 0) {
            handleNotifyGroupBusinessException(params, ex);
            return;
        }

        // 按状态正常结束事务(切面补偿记录将保留)
        // TxManager 存在请求异常或者响应异常两种情况。当请求异常时这里的业务需要补偿,当响应异常的时候需要做状态的事务清理。
        // 请求异常时
        //     参与放会根据上报补偿记录做事务的提交。
        // 响应异常时
        //     参与反会正常提交事务,本地业务提示事务。

        // 该两种情况下补偿信息均可以忽略,可直接把本地补偿记录数据删除。


        String unitId = (String) paramList.get(2);
        String transactionType = (String) paramList.get(3);
        try {
            transactionCleanTemplate.cleanWithoutAspectLog(groupId, unitId, transactionType, state);
        } catch (TransactionClearException e) {
            txLogger.error(groupId, unitId, "notify group", "{} > cleanWithoutAspectLog transaction error.", transactionType);
        }

        // 上报Manager,上报直到成功.
        tmReporter.reportTransactionState(groupId, null, TxExceptionParams.NOTIFY_GROUP_ERROR, state);
    }
}

package com.codingapi.txlcn.common.exception;

import java.io.Serializable;

/**
 * 分布式事务超时异常
 * @author shenjh
 * @version 1.0
 * @since 2019-09-25 16:45
 */
public class TxlcnTimeoutException extends RuntimeException implements Serializable {
    private static final long serialVersionUID = 1L;

    public TxlcnTimeoutException(String message) {
        super(message);
    }

    public TxlcnTimeoutException(Throwable ex) {
        super(ex);
    }

    public TxlcnTimeoutException() {
    }
}

skyer83 avatar Sep 25 '19 09:09 skyer83

5.02 TM查看异常记录会往数据库插入相同的两条记录且用官方demo会出现上述空指针异常。

iMikuX avatar Dec 04 '19 02:12 iMikuX

今天也遇到了这个问题,我使用的是 5.0.2 版本,采用的分布式事务是:

@LcnTransaction(propagation = DTXPropagation.REQUIRED)
@LcnTransaction(propagation = DTXPropagation.SUPPORTS)

跟踪源码发现我这边是由于分布式事务超时导致的空指针异常,问题分析及解决办法:

tx-lcn:
  manager:
    # 分布式事务超时时间(ms),需要大于[(微服务调用链长度 e) * (hystrix 超时时间) + N(多次跨服务调用))时间]
    # 否则会因为超时导致 com.codingapi.txlcn.tc.core.checking.DefaultDTXExceptionHandler.handleNotifyGroupBusinessException 的Throwable ex 参数为空,
    # 从而导致抛空指针异常,从而导致“结束事务”没执行,事务没结束,导致后续的请求一直卡在那里,即使接收的服务重启也没用
    # 异常重现步骤:
    #   1、接收服务 B ,设置 ribbon.ReadTimeout 为 10 秒,接收接口 Thread.sleep(9 * 1000) 9 秒
    #   1、启动请求服务 A,启动接收服务 B
    #   2、确保两个服务均已启动成功,且成功注册到Eureka ,且 spring 的 Gateway 网关可以正常转发请求
    #   3、服务 A 发起请求,通过 FeignClient 调用服务 B 的接口
    #   4、此时该接口的等待时间会超过 8 秒(分布式事务默认超时时间为 8 秒),从而导致分布式事务超时
    #   5、通过跟踪分布式事务超时处理源码
    #       TransactionControlTemplate.notifyGroup {
    #           if (globalContext.isDTXTimeout()) {
    #               throw new LcnBusinessException("dtx timeout.");
    #           }
    #           ...
    #           catch (LcnBusinessException e) {
    #               // 此时的 e.getCause() 是 null ,会导致 dtxExceptionHandler.handleNotifyGroupBusinessException 抛空指针异常
    #               dtxExceptionHandler.handleNotifyGroupBusinessException(Arrays.asList(groupId, state, unitId, transactionType), e.getCause());
    #           }
    #       }
    #       
    #       DefaultDTXExceptionHandler.handleNotifyGroupBusinessException {
    #           ...
    #           if ((ex.getCause() != null && ex.getCause() instanceof UserRollbackException)) // 此段代码抛空指针异常
    #           ...
    #           transactionCleanTemplate.clean(groupId, unitId, transactionType, state); // 导致本段代码无法执行,无法正常结束事务
    #       }
    #   6、此时就会出现分布式事务超时异常,从而导致分布式事务无法正常结束,然后后续的所有请求都会卡在这,一直报这个异常,即使服务 B 重启也没用,需要重启服务 A 才可以
    # 
    # 由于需要设置的(分布式事务超时时间(ms))无法直接确定确定,因此可进一步优化:
    #   1、设置(分布式事务超时时间(ms))大于等于 [1 * (hystrix 超时时间)]
    #   2、重写 DefaultDTXExceptionHandler.handleNotifyGroupBusinessException 对 ex 做空判断,确保事务能正常结束,然后抛出分布式事物超时异常,以便获知是分布式事务超时的情况
    #   3、确保重写的 DefaultDTXExceptionHandler 优于 txlcn-tc-5.0.2.RELEASE.jar 被加载
    dtx-time: 35000

重写代码如下:

/*
 * Copyright 2017-2019 CodingApi .
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package com.codingapi.txlcn.tc.core.checking;

import com.codingapi.txlcn.common.exception.*;
import com.codingapi.txlcn.logger.TxLogger;
import com.codingapi.txlcn.tc.txmsg.TMReporter;
import com.codingapi.txlcn.tc.core.template.TransactionCleanTemplate;
import com.codingapi.txlcn.txmsg.params.TxExceptionParams;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import java.util.List;

/**
 * Description:
 *      重写 handleNotifyGroupBusinessException 方法,对参数 Throwable ex 做空判断,解决分布式事务超时,引起 ex.getCause() 空指针,
 *      从而事务无法正常结束,导致后续请求一直卡住,一直抛空指针异常
 * Date: 2019/09/25
 *
 * @author shenjh
 *
 * Description: 分布式事务异常处理器类
 * Date: 2018/12/20
 *
 * @author ujued
 * @see DTXExceptionHandler
 *
 */
@Component
@Slf4j
public class DefaultDTXExceptionHandler implements DTXExceptionHandler {

    private static final TxLogger txLogger = TxLogger.newLogger(DefaultDTXExceptionHandler.class);

    private final TransactionCleanTemplate transactionCleanTemplate;

    private final TMReporter tmReporter;

    @Autowired
    public DefaultDTXExceptionHandler(TransactionCleanTemplate transactionCleanTemplate, TMReporter tmReporter) {
        this.transactionCleanTemplate = transactionCleanTemplate;
        this.tmReporter = tmReporter;
    }

    @Override
    public void handleCreateGroupBusinessException(Object params, Throwable ex) throws TransactionException {
        throw new TransactionException(ex);
    }

    @Override
    public void handleCreateGroupMessageException(Object params, Throwable ex) throws TransactionException {
        throw new TransactionException(ex);
    }

    @Override
    public void handleJoinGroupBusinessException(Object params, Throwable ex) throws TransactionException {
        List paramList = (List) params;
        String groupId = (String) paramList.get(0);
        String unitId = (String) paramList.get(1);
        String unitType = (String) paramList.get(2);
        try {
            transactionCleanTemplate.clean(groupId, unitId, unitType, 0);
        } catch (TransactionClearException e) {
            txLogger.error(groupId, unitId, "join group", "clean [{}]transaction fail.", unitType);
        }
        throw new TransactionException(ex);
    }

    @Override
    public void handleJoinGroupMessageException(Object params, Throwable ex) throws TransactionException {
        throw new TransactionException(ex);
    }

    @Override
    public void handleNotifyGroupBusinessException(Object params, Throwable ex) {
        List paramList = (List) params;
        String groupId = (String) paramList.get(0);
        int state = (int) paramList.get(1);
        String unitId = (String) paramList.get(2);
        String transactionType = (String) paramList.get(3);

        if (ex == null) { // add by shenjh 20190925 增加空判断
//            log.error("分布式事务超时! Note by shenjh");
            /*
            见:LcnConnectionProxy.RpcResponseState notify(int state)
            if (state == 1) {
                log.debug("commit transaction type[lcn] proxy connection:{}.", this);
                connection.commit();
            } else {
                log.debug("rollback transaction type[lcn] proxy connection:{}.", this);
                connection.rollback();
            }
             */
            state = 0; // 分布式事务回滚
        } else {
            //用户强制回滚.
            if (ex instanceof UserRollbackException) {
                state = 0;
            }
            if ((ex.getCause() != null && ex.getCause() instanceof UserRollbackException)) {
                state = 0;
            }
        }

        // 结束事务
        try {
            transactionCleanTemplate.clean(groupId, unitId, transactionType, state);
        } catch (TransactionClearException e) {
            txLogger.error(groupId, unitId, "notify group", "{} > clean transaction error.", transactionType);
        }

        if (ex == null) { // add by shenjh 20190925 增加抛出分布式超时异常
            throw new TxlcnTimeoutException("分布式事务超时! Note by shenjh");
        }
    }

    @Override
    public void handleNotifyGroupMessageException(Object params, Throwable ex) {
        // 当0 时候
        List paramList = (List) params;
        String groupId = (String) paramList.get(0);
        int state = (int) paramList.get(1);
        if (state == 0) {
            handleNotifyGroupBusinessException(params, ex);
            return;
        }

        // 按状态正常结束事务(切面补偿记录将保留)
        // TxManager 存在请求异常或者响应异常两种情况。当请求异常时这里的业务需要补偿,当响应异常的时候需要做状态的事务清理。
        // 请求异常时
        //     参与放会根据上报补偿记录做事务的提交。
        // 响应异常时
        //     参与反会正常提交事务,本地业务提示事务。

        // 该两种情况下补偿信息均可以忽略,可直接把本地补偿记录数据删除。


        String unitId = (String) paramList.get(2);
        String transactionType = (String) paramList.get(3);
        try {
            transactionCleanTemplate.cleanWithoutAspectLog(groupId, unitId, transactionType, state);
        } catch (TransactionClearException e) {
            txLogger.error(groupId, unitId, "notify group", "{} > cleanWithoutAspectLog transaction error.", transactionType);
        }

        // 上报Manager,上报直到成功.
        tmReporter.reportTransactionState(groupId, null, TxExceptionParams.NOTIFY_GROUP_ERROR, state);
    }
}
package com.codingapi.txlcn.common.exception;

import java.io.Serializable;

/**
 * 分布式事务超时异常
 * @author shenjh
 * @version 1.0
 * @since 2019-09-25 16:45
 */
public class TxlcnTimeoutException extends RuntimeException implements Serializable {
    private static final long serialVersionUID = 1L;

    public TxlcnTimeoutException(String message) {
        super(message);
    }

    public TxlcnTimeoutException(Throwable ex) {
        super(ex);
    }

    public TxlcnTimeoutException() {
    }
}

分布式事务执行总时间(ms). 默认为36000

tx-lcn.manager.dtx-time=36000 调大TM时间,空指针异常恢复

iMikuX avatar Dec 04 '19 02:12 iMikuX