JVM内存分析:线程死锁

提要

线程转储可用于分析Java应用在某一运行时刻的内部线程的运行情况,包括线程数、线程状态(死锁、运行、等待等),并且可得到线程的执行轨迹,对于分析线程死锁十分有益。

通过JDK内置的工具jstack可转储Java线程:sudo -u tomcat jstack -l <java_pid> > jstack.dump,注意,<java_pid>为主进程ID,无法dump某个线程。

获取Java线程ID:ps aux | grep java
需确保转储用户与线程用户相同,否则,易出现Unable to open socket file: target process not responding or HotSpot VM not loaded的问题;
当出现死锁时,dump操作可能失败,可以通过kill -3 <java_pid>终止死锁(其不会杀死进程或线程!);

得到转储文件后,可将其上传到fastThread进行在线分析,该服务可提供直观的分析图表和相关报告。

也可以下载IBM提供的工具IBM Thread and Monitor Dump Analyze,其同样提供图表分析功能,但整体上没有fastThread的直观。其启动命令为:java -jar jca447.jar

案例分析

以下为某次测试环境出现线程死锁后通过fastThread得到的分析报告内容:

pool-297-thread-25 is in deadlock with http-nio-8080-exec-548
pool-297-thread-25 - priority:5 - threadId:0x00007fdf2784f800 - nativeId:0x1474 - state:BLOCKED
stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mysql.jdbc.ResultSetImpl.fastTimestampCreate(ResultSetImpl.java:1062)
- waiting to lock <0x000000070618c2c0> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1393)
- locked <0x00000007061996b8> (a java.util.GregorianCalendar)
at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:576)
at com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6588)
at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:6188)
at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:6226)
at org.apache.commons.dbcp.DelegatingResultSet.getTimestamp(DelegatingResultSet.java:300)
at org.apache.commons.dbcp.DelegatingResultSet.getTimestamp(DelegatingResultSet.java:300)
at org.apache.commons.dbcp.DelegatingResultSet.getTimestamp(DelegatingResultSet.java:300)
at org.hibernate.type.descriptor.sql.TimestampTypeDescriptor$2.doExtract(TimestampTypeDescriptor.java:67)
at ...
at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1673)
at ...
at org.hibernate.internal.SessionImpl.fireLoad(SessionImpl.java:1098)
at org.hibernate.internal.SessionImpl.immediateLoad(SessionImpl.java:1013)
at org.hibernate.proxy.AbstractLazyInitializer.initialize(AbstractLazyInitializer.java:173)
at org.hibernate.proxy.AbstractLazyInitializer.getImplementation(AbstractLazyInitializer.java:285)
at org.hibernate.proxy.pojo.javassist.JavassistLazyInitializer.invoke(JavassistLazyInitializer.java:185)
at com.xx.zz.model.User_$$_jvst8ac_ac.equals(User_$$_jvst8ac_ac.java)
at java.util.Vector.indexOf(Vector.java:408)
- locked <0x000000070624bc10> (a java.util.Stack)
at java.util.Vector.contains(Vector.java:367)
at com.xx.zz.json.JSONWriter.value(Unknown Source)
at com.xx.zz.json.JSONWriter.appendProp(Unknown Source)
at com.xx.zz.json.JSONWriter.bean(Unknown Source)
at com.xx.zz.json.JSONWriter.doValue(Unknown Source)
at com.xx.zz.json.JSONWriter.value(Unknown Source)
at com.xx.zz.json.JSONWriter.write(Unknown Source)
at com.xx.zz.json.JSONUtil.serialize(Unknown Source)
at com.xx.zz.slm.OlaEngine.getJsonObejct(OlaEngine.java:191)
at com.xx.zz.slm.OlaEngine.startMonitor(OlaEngine.java:61)
at com.xx.zz.slm.manager.XxxThreadPoolManager$1.run(XxxThreadPoolManager.java:68)
- locked <0x000000070624bcb0> (a com.xx.zz.slm.manager.XxxThreadPoolManager$1)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x0000000706073548> (a java.util.concurrent.ThreadPoolExecutor$Worker)

http-nio-8080-exec-548 - priority:5 - threadId:0x00007fe02c00c000 - nativeId:0x146e - state:BLOCKED
stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mysql.jdbc.PreparedStatement.setTimestampInternal(PreparedStatement.java:4819)
- waiting to lock <0x00000007061996b8> (a java.util.GregorianCalendar)
- locked <0x000000070618c2c0> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.setTimestamp(PreparedStatement.java:4786)
at org.apache.commons.dbcp.DelegatingPreparedStatement.setTimestamp(DelegatingPreparedStatement.java:147)
at org.apache.commons.dbcp.DelegatingPreparedStatement.setTimestamp(DelegatingPreparedStatement.java:147)
at org.apache.commons.dbcp.DelegatingPreparedStatement.setTimestamp(DelegatingPreparedStatement.java:147)
at org.hibernate.type.descriptor.sql.TimestampTypeDescriptor$1.doBind(TimestampTypeDescriptor.java:58)
at ...
at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:3205)
at ...
at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:52)
at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1240)
at com.xx.zz.StartProcessInstanceListener.onEvent(StartProcessInstanceListener.java:114)
at ...
at com.xx.zz.activiti.cmd.SuperCompleteTaskCmd.execute(SuperCompleteTaskCmd.java:68)
at ...
at com.xx.zz.activiti.BaseActivitiTaskService.completeTask(BaseActivitiTaskService.java:82)
at com.xx.zz.BusinessProcessServiceImpl.completeTask(BusinessProcessServiceImpl.java:313)
at ...
at com.sun.proxy.$Proxy1859.completeTask(Unknown Source)
at com.xx.zz.BaseController.completeTask(BaseController.java:216)
at com.xx.zz.CompleteTask.execute(CompleteTask.java:57)
at ...
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:802)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1410)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
- locked <0x000000070624fcb0> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x0000000705c194b0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

报告第一行即说明了pool-297-thread-25http-nio-8080-exec-548两个线程出现了竞争死锁,前者已经锁住<0x00000007061996b8> (a java.util.GregorianCalendar)并正在尝试锁住<0x000000070618c2c0> (a com.mysql.jdbc.JDBC4Connection),而后者已经锁住相同(内存地址相同)的JDBC Connection,并试图对<0x00000007061996b8> (a java.util.GregorianCalendar)加锁,因此,两者持锁等待,结果出现了死锁!

下面来分析一下两个线程在何时因为什么而发生了死锁。

线程pool-297-thread-25正在对某个BO(业务模型)做与监控相关的处理,并使用JSONWriter序列化该BO,由于Hibernate的懒加载特性,AbstractLazyInitializer还需要从数据库中获取级联对象的数据,并最终在获取JDBC时间戳时出现了资源竞争。

而线程http-nio-8080-exec-548是一个HTTP请求/响应线程,其负责处理流程任务的完成操作,其在尝试更新JDBC时间戳时先于前一个线程对JDBC Connection加锁成功,却未能锁住GregorianCalendar

根据Hibernate的特性以及懒加载所带来的副作用不受重视的现实进行分析,可以大致还原出整个处理过程:

  • 用户点击了任务完成按钮,应用也已经完成了对该任务的处理,并正在创建下一个任务实例;
  • OLA(运营级别协议)的逻辑需要监听流程任务的创建事件,以便于及时得到相关的BO信息并做相关业务处理;
  • 流程任务创建事件触发后,OLA先在任务创建的Listener中获取到BO对象,再将该BO对象放到OLA监控器中;
  • OLA监控器会启动新的线程(由线程池分配和调度),并在线程中序列化该BO对象,进而做存储或其他事情;
  • 这里存在两个线程:任务创建线程(http-nio-8080-exec-548)和OLA监控线程(pool-297-thread-25);
  • 而由于懒加载的缘故,BO的级联对象将继续持有查询时所使用的Session实例,而该Session在HTTP请求线程中将被独占使用,相应的JDBC Connection也是被独占的;
  • 因此,JDBC Connection的共享导致了死锁;

解决方案

针对上述问题,根本的解决方案就是:

  • 流程任务创建时,仅将与BO相关的id和类型传递给OLA监控器;
  • OLA监控器在所创建的线程内独自创建Hibernate Session(注意用后释放),再根据BO的id和类型查询所需信息;

因此,需牢记接口数据的交互原则:

  • 拒绝共享;
  • 仅交换Plain数据(非对象、非结构),即核心基础数据,其他业务数据通过基础数据获取;
文章作者: flytreeleft
文章链接: https://flytreeleft.github.io/the-jvm-dump-analyse-for-the-thread-dead-lock/
版权声明: 本博客所有文章除特别声明外,均采用 知识共享署名 4.0 国际许可协议 许可协议。转载请注明来自 flytreeleft's Blog