TL; DR

  1. 如果索引列中有自增列,mysql 会使用自增锁,自增锁有不同的模式;
  2. INSERTINSERT ON DUPLICATE KEY UPDATE 属于不同类型的插入类型;
  3. 使用了自增锁之后,很有可能只用 record lock 不会用 gap lock 了;
  4. 本文其实是《解决死锁之道》的实例和注释,关于锁的类型,大家还是去看《解决死锁之道》系列文章比较好;

背景

现在在做促销的工作,上周某个需求需要我记录某个商家享受某个活动的折扣。我用一张表记录这个店家享受了折扣的次数:

CREATE TABLE `issued_history_tab` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `create_time` int(10) unsigned NOT NULL DEFAULT '0',
  `update_time` int(10) unsigned NOT NULL DEFAULT '0',
  `event_id` bigint(64) NOT NULL DEFAULT '0',
  `dimension_key` varchar(128) NOT NULL DEFAULT '',
  `dimension_value` varchar(128) NOT NULL DEFAULT '',
  `dimension_count` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `uniq_dimension_idx` (`event_id`,`dimension_value`,`dimension_key`),
  KEY `idx_event_id` (`event_id`),
  KEY `idx_dimension_pair` (`dimension_value`,`dimension_key`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

这里会有一个问题,正常情况下,我们只需要 update dimension_count 即可如果一个商家从来没有享受过活动优惠,那它不会在表里面有记录。对于这种商家,我们要先插入一条记录,再 update 都应的 dimension_count 的数量。在并发的情况下,就要考虑锁的问题。

我们最初考虑采用 ON DUPLICATE KEY UPDATE 这种方案,然后怀疑会有死锁的风险。当时做了几个测试,但是都不能确定会不会死锁,于是我们放弃了这种方案。现在有闲暇了,我决定研究一下,究竟会不会死锁,或者简单来说,本文其实是要探究在给定索引情况更新表的时候,innodb 会加什么锁1。就我们当时观察的情况来看,感觉不会出现死锁。

测试

测试环境innodb_version 5.7.29, 隔离级别 REPEATABLE-READ。

首先,我们需要打开监控2。注意下面这个监控只能在 root 下执行。

SET GLOBAL innodb_status_output=ON; 
SET GLOBAL innodb_status_output_locks=ON;

然后我们打开客户端,开三个 session,插入如下 SQL:

BEGIN;
INSERT INTO `issued_history_tab` (`event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (10088, "merchant_id", "0080", 1) ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1;

注意不要 COMMIT 然后在 root 用户下,执行 show engine innodb status

你可以看到如下输出:

------------
TRANSACTIONS
------------
Trx id counter 30360
Purge done for trx's n:o < 30358 undo n:o < 0 state: running but idle
History list length 35
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421953960198600, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421953960196792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 30359, ACTIVE 6 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 21, OS thread handle 123145566531584, query id 4060 localhost 127.0.0.1 promodev update
INSERT INTO `issued_history_tab` (`event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (10088, "merchant_id", "0080", 1) ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1
------- TRX HAS BEEN WAITING 6 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 694 page no 4 n bits 72 index uniq_dimension_idx of table `beepay_txn_db`.`issued_history_tab` trx id 30359 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 8000000000002768; asc       'h;;
 1: len 4; hex 30303830; asc 0080;;
 2: len 11; hex 6d65726368616e745f6964; asc merchant_id;;
 3: len 8; hex 0000000000000004; asc         ;;

------------------
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30359 lock mode IX
RECORD LOCKS space id 694 page no 4 n bits 72 index uniq_dimension_idx of table `beepay_txn_db`.`issued_history_tab` trx id 30359 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 8000000000002768; asc       'h;;
 1: len 4; hex 30303830; asc 0080;;
 2: len 11; hex 6d65726368616e745f6964; asc merchant_id;;
 3: len 8; hex 0000000000000004; asc         ;;

---TRANSACTION 30358, ACTIVE 10 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 18, OS thread handle 123145565417472, query id 4057 localhost 127.0.0.1 promodev update
INSERT INTO `issued_history_tab` (`event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (10088, "merchant_id", "0080", 1) ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1
------- TRX HAS BEEN WAITING 10 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 694 page no 4 n bits 72 index uniq_dimension_idx of table `beepay_txn_db`.`issued_history_tab` trx id 30358 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 8000000000002768; asc       'h;;
 1: len 4; hex 30303830; asc 0080;;
 2: len 11; hex 6d65726368616e745f6964; asc merchant_id;;
 3: len 8; hex 0000000000000004; asc         ;;

------------------
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30358 lock mode IX
RECORD LOCKS space id 694 page no 4 n bits 72 index uniq_dimension_idx of table `beepay_txn_db`.`issued_history_tab` trx id 30358 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 8000000000002768; asc       'h;;
 1: len 4; hex 30303830; asc 0080;;
 2: len 11; hex 6d65726368616e745f6964; asc merchant_id;;
 3: len 8; hex 0000000000000004; asc         ;;

---TRANSACTION 30353, ACTIVE 17 sec
2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 16, OS thread handle 123145565696000, query id 4054 localhost 127.0.0.1 promodev
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30353 lock mode IX
RECORD LOCKS space id 694 page no 4 n bits 72 index uniq_dimension_idx of table `beepay_txn_db`.`issued_history_tab` trx id 30353 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 8000000000002768; asc       'h;;
 1: len 4; hex 30303830; asc 0080;;
 2: len 11; hex 6d65726368616e745f6964; asc merchant_id;;
 3: len 8; hex 0000000000000004; asc         ;;

所以说,实际上三个 session 都加上了 record lock 。坦白说,我想到脑爆都想不到这样会死锁。

当然实际上,我找到了使用 ON DUPLICATE KEY UPDATE 导致死锁的例子。这里主要的原因是因为 gap lock。我尝试使用当前的表来还原一下。

mysql> SELECT * FROM issued_history_tab ORDER BY id;
+----+-------------+-------------+----------+---------------+-----------------+-----------------+
| id | create_time | update_time | event_id | dimension_key | dimension_value | dimension_count |
+----+-------------+-------------+----------+---------------+-----------------+-----------------+
|  1 |           0 |           0 |    10086 | merchant_id   | 0079            |               1 |
|  2 |           0 |           0 |    10087 | merchant_id   | 0079            |               1 |
|  4 |           0 |           0 |    10088 | merchant_id   | 0080            |               3 |
|  9 |           0 |           0 |    10100 | merchant_id   | 0080            |               1 |
+----+-------------+-------------+----------+---------------+-----------------+-----------------+

这是表里面现在的情况。

首先在三个不同的事务里面执行如下事务:

BEGIN;
INSERT INTO `issued_history_tab` (`id`, `event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (10, 10089, "merchant_id", "0080", 1) 
ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1;
COMMIT;

BEGIN;
INSERT INTO `issued_history_tab` (`id`, `event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (11, 10090, "merchant_id", "0080", 1) 
ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1;
COMMIT;

BEGIN;
INSERT INTO `issued_history_tab` (`id`, `event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (12, 10099, "merchant_id", "0080", 1) 
ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1;
COMMIT;

但实际上,引擎只加了 IX 锁:

------------
TRANSACTIONS
------------
Trx id counter 30607
Purge done for trx's n:o < 30605 undo n:o < 0 state: running but idle
History list length 39
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421953960196792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421953960198600, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 30606, ACTIVE 8 sec
1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
MySQL thread id 27, OS thread handle 123145565696000, query id 5106 localhost 127.0.0.1 promodev
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30606 lock mode IX
---TRANSACTION 30605, ACTIVE 12 sec
1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
MySQL thread id 25, OS thread handle 123145566531584, query id 5103 localhost 127.0.0.1 promodev
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30605 lock mode IX
---TRANSACTION 30600, ACTIVE 16 sec
1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
MySQL thread id 23, OS thread handle 123145565417472, query id 5100 localhost 127.0.0.1 promodev
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30600 lock mode IX

没有观察到例子中出现的 gap lock。

于是我把 id 去掉。依然只有 IX 锁。我决定再给 event_id 加唯一索引。依然只有 IX 锁。

然后我再仔细读一遍例子。把多余的索引都去掉,然后重新在两个 session 中执行三个 SQL 语句,依然只有 IX 锁。换言之,我无法复现例子里面的死锁条件。

发生了什么事情

考虑我本来是想复现因为 gap lock 导致死锁问题,然后写一篇播客了事的。但是既然复现不了,那场面就不是一般尴尬了。

所以到底为什么折腾半天都没有搞出 gap lock 呢?

主要是因为我们设计表的时候用了自增的 primary key。参考《解决死锁之道》3,因为有自增列,所以我们前面的操作都会加上自增锁。

自增表是一种特殊类型的表锁,所以它会导致并发插入效率低下,所以5.1.22 版本之后引入了几种不同的模式。

mysql> SELECT @@innodb_autoinc_lock_mode;
+----------------------------+
| @@innodb_autoinc_lock_mode |
+----------------------------+
|                          1 |
+----------------------------+
1 row in set (0.00 sec)

我测试用的 mysql 实例对应的自增锁模式如上。mysql 会将插入语句分为 simple insert, bulk inserts 和 mixed-mode insert,具体可以参考《解决死锁之道》官方文档

这里只想说的一点是:ON DUPLICATE KEY UPDATE 属于 mixed-mode insert (主要是这种情况下你可能增加记录,也可能更新已有的记录),但是自增列一定会继续更新。比如说我更新 event_id 为 10100 的 dimension_count 三次,这三次都不会新增记录,但是我再插入新记录的时候,自增 id 就会变成13(10,11,12在之前三次更新的时候生成了)。

总的来说,因为插入的时候,每一行都获得了自增的主键,那后续的锁都是 record lock 就不足为奇了。

为了证明这个猜想,我把 id 的 auto increment 去掉,再做测试。

# session 1
BEGIN;
SELECT * FROM `issued_history_tab` WHERE `id` = 60 LOCK IN SHARE MODE;
COMMIT;

# sesssion 2
BEGIN;
INSERT INTO `issued_history_tab` (`id`, `event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (60, 10060, "merchant_id", "0080", 1) ;
COMMIT;

这次终于可以搞出 gap lock 来了。

mysql> select * from information_schema.innodb_locks;
+--------------------------+-----------------+-----------+-----------+--------------------------------------+------------+------------+-----------+----------+-----------+
| lock_id                  | lock_trx_id     | lock_mode | lock_type | lock_table                           | lock_index | lock_space | lock_page | lock_rec | lock_data |
+--------------------------+-----------------+-----------+-----------+--------------------------------------+------------+------------+-----------+----------+-----------+
| 30804:695:3:11           | 30804           | X,GAP     | RECORD    | `beepay_txn_db`.`issued_history_tab` | PRIMARY    |        695 |         3 |       11 | 69        |
| 421953960197696:695:3:11 | 421953960197696 | S,GAP     | RECORD    | `beepay_txn_db`.`issued_history_tab` | PRIMARY    |        695 |         3 |       11 | 69        |
+--------------------------+-----------------+-----------+-----------+--------------------------------------+------------+------------+-----------+----------+-----------+
2 rows in set, 1 warning (0.00 sec)

脚注

  1. 其实我很好奇,到底哪里还有不用 innodb 的场景。 

  2. 参考文献:MySQL加锁分析 

  3. 其实本文就是《解决死锁之道》的一个注释,一个实例,本文没啥营养,去看《参考死锁之道》就好。