关于 CRUD 工程师日常遇到 mysql 锁问题的那些事
TL; DR
- 如果索引列中有自增列,mysql 会使用自增锁,自增锁有不同的模式;
INSERT
和INSERT ON DUPLICATE KEY UPDATE
属于不同类型的插入类型;- 使用了自增锁之后,很有可能只用 record lock 不会用 gap lock 了;
- 本文其实是《解决死锁之道》的实例和注释,关于锁的类型,大家还是去看《解决死锁之道》系列文章比较好;
背景
现在在做促销的工作,上周某个需求需要我记录某个商家享受某个活动的折扣。我用一张表记录这个店家享受了折扣的次数:
CREATE TABLE `issued_history_tab` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`create_time` int(10) unsigned NOT NULL DEFAULT '0',
`update_time` int(10) unsigned NOT NULL DEFAULT '0',
`event_id` bigint(64) NOT NULL DEFAULT '0',
`dimension_key` varchar(128) NOT NULL DEFAULT '',
`dimension_value` varchar(128) NOT NULL DEFAULT '',
`dimension_count` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`) USING BTREE,
UNIQUE KEY `uniq_dimension_idx` (`event_id`,`dimension_value`,`dimension_key`),
KEY `idx_event_id` (`event_id`),
KEY `idx_dimension_pair` (`dimension_value`,`dimension_key`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
这里会有一个问题,正常情况下,我们只需要 update dimension_count
即可如果一个商家从来没有享受过活动优惠,那它不会在表里面有记录。对于这种商家,我们要先插入一条记录,再 update 都应的 dimension_count
的数量。在并发的情况下,就要考虑锁的问题。
我们最初考虑采用 ON DUPLICATE KEY UPDATE
这种方案,然后怀疑会有死锁的风险。当时做了几个测试,但是都不能确定会不会死锁,于是我们放弃了这种方案。现在有闲暇了,我决定研究一下,究竟会不会死锁,或者简单来说,本文其实是要探究在给定索引情况更新表的时候,innodb 会加什么锁1。就我们当时观察的情况来看,感觉不会出现死锁。
测试
测试环境innodb_version 5.7.29, 隔离级别 REPEATABLE-READ。
首先,我们需要打开监控2。注意下面这个监控只能在 root 下执行。
SET GLOBAL innodb_status_output=ON;
SET GLOBAL innodb_status_output_locks=ON;
然后我们打开客户端,开三个 session,插入如下 SQL:
BEGIN;
INSERT INTO `issued_history_tab` (`event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (10088, "merchant_id", "0080", 1) ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1;
注意不要 COMMIT
然后在 root 用户下,执行 show engine innodb status
。
你可以看到如下输出:
------------
TRANSACTIONS
------------
Trx id counter 30360
Purge done for trx's n:o < 30358 undo n:o < 0 state: running but idle
History list length 35
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421953960198600, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421953960196792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 30359, ACTIVE 6 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 21, OS thread handle 123145566531584, query id 4060 localhost 127.0.0.1 promodev update
INSERT INTO `issued_history_tab` (`event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (10088, "merchant_id", "0080", 1) ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1
------- TRX HAS BEEN WAITING 6 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 694 page no 4 n bits 72 index uniq_dimension_idx of table `beepay_txn_db`.`issued_history_tab` trx id 30359 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 8; hex 8000000000002768; asc 'h;;
1: len 4; hex 30303830; asc 0080;;
2: len 11; hex 6d65726368616e745f6964; asc merchant_id;;
3: len 8; hex 0000000000000004; asc ;;
------------------
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30359 lock mode IX
RECORD LOCKS space id 694 page no 4 n bits 72 index uniq_dimension_idx of table `beepay_txn_db`.`issued_history_tab` trx id 30359 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 8; hex 8000000000002768; asc 'h;;
1: len 4; hex 30303830; asc 0080;;
2: len 11; hex 6d65726368616e745f6964; asc merchant_id;;
3: len 8; hex 0000000000000004; asc ;;
---TRANSACTION 30358, ACTIVE 10 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 18, OS thread handle 123145565417472, query id 4057 localhost 127.0.0.1 promodev update
INSERT INTO `issued_history_tab` (`event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (10088, "merchant_id", "0080", 1) ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1
------- TRX HAS BEEN WAITING 10 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 694 page no 4 n bits 72 index uniq_dimension_idx of table `beepay_txn_db`.`issued_history_tab` trx id 30358 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 8; hex 8000000000002768; asc 'h;;
1: len 4; hex 30303830; asc 0080;;
2: len 11; hex 6d65726368616e745f6964; asc merchant_id;;
3: len 8; hex 0000000000000004; asc ;;
------------------
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30358 lock mode IX
RECORD LOCKS space id 694 page no 4 n bits 72 index uniq_dimension_idx of table `beepay_txn_db`.`issued_history_tab` trx id 30358 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 8; hex 8000000000002768; asc 'h;;
1: len 4; hex 30303830; asc 0080;;
2: len 11; hex 6d65726368616e745f6964; asc merchant_id;;
3: len 8; hex 0000000000000004; asc ;;
---TRANSACTION 30353, ACTIVE 17 sec
2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 16, OS thread handle 123145565696000, query id 4054 localhost 127.0.0.1 promodev
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30353 lock mode IX
RECORD LOCKS space id 694 page no 4 n bits 72 index uniq_dimension_idx of table `beepay_txn_db`.`issued_history_tab` trx id 30353 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 8; hex 8000000000002768; asc 'h;;
1: len 4; hex 30303830; asc 0080;;
2: len 11; hex 6d65726368616e745f6964; asc merchant_id;;
3: len 8; hex 0000000000000004; asc ;;
所以说,实际上三个 session 都加上了 record lock 。坦白说,我想到脑爆都想不到这样会死锁。
当然实际上,我找到了使用 ON DUPLICATE KEY UPDATE
导致死锁的例子。这里主要的原因是因为 gap lock。我尝试使用当前的表来还原一下。
mysql> SELECT * FROM issued_history_tab ORDER BY id;
+----+-------------+-------------+----------+---------------+-----------------+-----------------+
| id | create_time | update_time | event_id | dimension_key | dimension_value | dimension_count |
+----+-------------+-------------+----------+---------------+-----------------+-----------------+
| 1 | 0 | 0 | 10086 | merchant_id | 0079 | 1 |
| 2 | 0 | 0 | 10087 | merchant_id | 0079 | 1 |
| 4 | 0 | 0 | 10088 | merchant_id | 0080 | 3 |
| 9 | 0 | 0 | 10100 | merchant_id | 0080 | 1 |
+----+-------------+-------------+----------+---------------+-----------------+-----------------+
这是表里面现在的情况。
首先在三个不同的事务里面执行如下事务:
BEGIN;
INSERT INTO `issued_history_tab` (`id`, `event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (10, 10089, "merchant_id", "0080", 1)
ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1;
COMMIT;
BEGIN;
INSERT INTO `issued_history_tab` (`id`, `event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (11, 10090, "merchant_id", "0080", 1)
ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1;
COMMIT;
BEGIN;
INSERT INTO `issued_history_tab` (`id`, `event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (12, 10099, "merchant_id", "0080", 1)
ON DUPLICATE KEY UPDATE `dimension_count` = `dimension_count` + 1;
COMMIT;
但实际上,引擎只加了 IX 锁:
------------
TRANSACTIONS
------------
Trx id counter 30607
Purge done for trx's n:o < 30605 undo n:o < 0 state: running but idle
History list length 39
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421953960196792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421953960198600, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 30606, ACTIVE 8 sec
1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
MySQL thread id 27, OS thread handle 123145565696000, query id 5106 localhost 127.0.0.1 promodev
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30606 lock mode IX
---TRANSACTION 30605, ACTIVE 12 sec
1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
MySQL thread id 25, OS thread handle 123145566531584, query id 5103 localhost 127.0.0.1 promodev
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30605 lock mode IX
---TRANSACTION 30600, ACTIVE 16 sec
1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
MySQL thread id 23, OS thread handle 123145565417472, query id 5100 localhost 127.0.0.1 promodev
TABLE LOCK table `beepay_txn_db`.`issued_history_tab` trx id 30600 lock mode IX
没有观察到例子中出现的 gap lock。
于是我把 id 去掉。依然只有 IX 锁。我决定再给 event_id 加唯一索引。依然只有 IX 锁。
然后我再仔细读一遍例子。把多余的索引都去掉,然后重新在两个 session 中执行三个 SQL 语句,依然只有 IX 锁。换言之,我无法复现例子里面的死锁条件。
发生了什么事情
考虑我本来是想复现因为 gap lock 导致死锁问题,然后写一篇播客了事的。但是既然复现不了,那场面就不是一般尴尬了。
所以到底为什么折腾半天都没有搞出 gap lock 呢?
主要是因为我们设计表的时候用了自增的 primary key。参考《解决死锁之道》3,因为有自增列,所以我们前面的操作都会加上自增锁。
自增表是一种特殊类型的表锁,所以它会导致并发插入效率低下,所以5.1.22 版本之后引入了几种不同的模式。
mysql> SELECT @@innodb_autoinc_lock_mode;
+----------------------------+
| @@innodb_autoinc_lock_mode |
+----------------------------+
| 1 |
+----------------------------+
1 row in set (0.00 sec)
我测试用的 mysql 实例对应的自增锁模式如上。mysql 会将插入语句分为 simple insert, bulk inserts 和 mixed-mode insert,具体可以参考《解决死锁之道》和官方文档。
这里只想说的一点是:ON DUPLICATE KEY UPDATE
属于 mixed-mode insert (主要是这种情况下你可能增加记录,也可能更新已有的记录),但是自增列一定会继续更新。比如说我更新 event_id
为 10100 的 dimension_count
三次,这三次都不会新增记录,但是我再插入新记录的时候,自增 id 就会变成13(10,11,12在之前三次更新的时候生成了)。
总的来说,因为插入的时候,每一行都获得了自增的主键,那后续的锁都是 record lock 就不足为奇了。
为了证明这个猜想,我把 id 的 auto increment 去掉,再做测试。
# session 1
BEGIN;
SELECT * FROM `issued_history_tab` WHERE `id` = 60 LOCK IN SHARE MODE;
COMMIT;
# sesssion 2
BEGIN;
INSERT INTO `issued_history_tab` (`id`, `event_id`, `dimension_key`, `dimension_value`, `dimension_count`) VALUE (60, 10060, "merchant_id", "0080", 1) ;
COMMIT;
这次终于可以搞出 gap lock 来了。
mysql> select * from information_schema.innodb_locks;
+--------------------------+-----------------+-----------+-----------+--------------------------------------+------------+------------+-----------+----------+-----------+
| lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
+--------------------------+-----------------+-----------+-----------+--------------------------------------+------------+------------+-----------+----------+-----------+
| 30804:695:3:11 | 30804 | X,GAP | RECORD | `beepay_txn_db`.`issued_history_tab` | PRIMARY | 695 | 3 | 11 | 69 |
| 421953960197696:695:3:11 | 421953960197696 | S,GAP | RECORD | `beepay_txn_db`.`issued_history_tab` | PRIMARY | 695 | 3 | 11 | 69 |
+--------------------------+-----------------+-----------+-----------+--------------------------------------+------------+------------+-----------+----------+-----------+
2 rows in set, 1 warning (0.00 sec)