Wednesday, January 18, 2017

Why I wrote "please do not ignore warnings" and "to always investigate/fix warnings" (in MySQL/MariaDB)

In a last post, I wrote the two following sentences:
  • please do not ignore warnings
  • always investigate/fix warnings
I realized that without context, this might be hard to understand.  In this post, I want to give more background about these two sentences.

In my work, I have seen problems solved (wrongly) by ignoring warnings.  Some example that comes to my mind:
  • upgrading to a new MySQL version cause a new warning, let's silence warnings
  • this program does the expected work, but generates warnings, let's ignore warnings
  • (and I could  probably find more examples...)
I claim that silencing/ignoring warnings is bad: it increases technical debt (that will need to be repaid) and it arms traps for future generations.
I claim that silencing/ignoring warnings is bad:
it increases technical debt (that will need to be repaid)
and it arms traps for future generations.
If MySQL/MariaDB is showing a warning, it should not be ignored.  A way to do what you want without ignoring warnings should be found (ideally, without generating warnings).  If you cannot find such a way, bring this subject up with other developers or with DBAs so an acceptable solution can be found.

In my post about a Funny replication breakage, I showed how ignoring a warning on the master made replication break on slaves.  I published another post about this subject, its title is below.  In the current post, I  describe what happen last week that triggered my aversion to ignoring warnings to resurface.
So this week, I was upgrading one of our systems from MySQL 5.6 to MySQL 5.7.  This is not a critical system, but I still did the tests that I usually do in such situation:
  • a MySQL 5.7 slave was running for some time (it was replicating from the 5.6 master),
  • and I tested replication from 5.7 to 5.6 in case I need to rollback.
In this case, I did not point read to 5.7 because reads are only done on the master (the slaves are there only for high-availability).  I did not test the application with 5.7 because it would have been too much work (setup a full test environment).  I also have a contingency plan (I was ready to rollback) and we would have been able to re-inject the lost data if something went wrong.  But not testing writes on 5.7 left me with a surprise...
But not testing writes on 5.7 left me with a surprise...
After promoting a 5.7 slave as the new master (having the old 5.6 master as a slave of the new master), this warning began to show up:
Warning 1062 Duplicate entry '<a value>' for key '<an index>'
The (bad) way to solve this by the application developer was to stop reporting warnings for that statement.  I believe this is the wrong way to go.  But before getting in this good vs bad discussion, let's see what happened after upgrading to 5.7.

In MySQL 5.6, an INSERT IGNORE does not report unique key violations:
> SELECT VERSION();
+------------+
| VERSION()  |
+------------+
| 5.6.35-log |
+------------+
1 row in set (0.00 sec)

> CREATE TABLE test_jfg (
    ->   id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT,
    ->   n_uniq TINYINT NOT NULL UNIQUE,
    ->   t VARCHAR(10) NOT NULL DEFAULT ''
    -> ) CHARACTER SET ASCII;
Query OK, 0 rows affected (0.00 sec)

> INSERT INTO test_jfg (n_uniq) values (1);
Query OK, 1 row affected (0.00 sec)

> INSERT INTO test_jfg (n_uniq) values (1);
ERROR 1062 (23000): Duplicate entry '1' for key 'n_uniq'
> INSERT IGNORE INTO test_jfg (n_uniq) values (1);
Query OK, 0 rows affected (0.00 sec)

> SHOW WARNINGS;
Empty set (0.00 sec)
This behavior is documented in the manual:
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are ignored. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row is discarded and no error occurs. Ignored errors may generate warnings instead, although duplicate-key errors do not.
However, this was changed in MySQL 5.7.5 (and I was upgrading to 5.7.17):
> SELECT VERSION();
+------------+
| VERSION()  |
+------------+
| 5.7.17-log |
+------------+
1 row in set (0.00 sec)

> CREATE TABLE test_jfg (
    ->   id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT,
    ->   n_uniq TINYINT NOT NULL UNIQUE,
    ->   t VARCHAR(10) NOT NULL DEFAULT ''
    -> ) CHARACTER SET ASCII;
Query OK, 0 rows affected (0.01 sec)

> INSERT INTO test_jfg (n_uniq) values (1);
Query OK, 1 row affected (0.00 sec)

> INSERT INTO test_jfg (n_uniq) values (1);
ERROR 1062 (23000): Duplicate entry '1' for key 'n_uniq'
> INSERT IGNORE INTO test_jfg (n_uniq) values (1);
Query OK, 0 rows affected, 1 warning (0.00 sec)

> SHOW WARNINGS;
+---------+------+--------------------------------------+
| Level   | Code | Message                              |
+---------+------+--------------------------------------+
| Warning | 1062 | Duplicate entry '1' for key 'n_uniq' |
+---------+------+--------------------------------------+
1 row in set (0.00 sec)
Some time ago, the MySQL Server developers published an article about this (Improvements to the MySQL `IGNORE` Implementation) and the manual has been updated (removing the "although duplicate-key errors do not").

I share the pain of the people getting those warnings in their mailbox after the upgrade because there were many of them (it was very noisy), and they had other things to do than fixing them.  However, even if this is "by design" in the application (this is exactly why the IGNORE extension is used) those warnings must not reach a human.  But ignoring all warnings (or stop reporting them) is a bad idea.

The reason I believe ignoring warnings or to stop reporting them all is a bad idea is the following: silencing/ignoring warnings might hide important things.
Silencing/ignoring warnings might hide important things.
Said otherwise, it is not because your program now only generate "Duplicate entry" warning that it will only generate those warnings in the future.  Some example below (with 5.7.17 and the same table definition as above):
> INSERT IGNORE INTO test_jfg (n_uniq) values (1024); Query OK, 1 row affected, 1 warning (0.00 sec) > SHOW WARNINGS; +---------+------+-------------------------------------------------+ | Level | Code | Message | +---------+------+-------------------------------------------------+ | Warning | 1264 | Out of range value for column 'n_uniq' at row 1 | +---------+------+-------------------------------------------------+ 1 row in set (0.00 sec) > INSERT IGNORE INTO test_jfg (n_uniq) values (1025); Query OK, 0 rows affected, 2 warnings (0.00 sec) > SHOW WARNINGS; +---------+------+-------------------------------------------------+ | Level | Code | Message | +---------+------+-------------------------------------------------+ | Warning | 1264 | Out of range value for column 'n_uniq' at row 1 | | Warning | 1062 | Duplicate entry '127' for key 'n_uniq' | +---------+------+-------------------------------------------------+ 2 rows in set (0.00 sec) > INSERT IGNORE INTO test_jfg (n_uniq, t) values (2, '123456789ABCDEF'); Query OK, 1 row affected, 1 warning (0.00 sec) > SHOW WARNINGS; +---------+------+----------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------+ | Warning | 1265 | Data truncated for column 't' at row 1 | +---------+------+----------------------------------------+ 1 row in set (0.00 sec) > INSERT IGNORE INTO test_jfg (n_uniq, t) values (3, 'Gagné'); Query OK, 1 row affected, 1 warning (0.01 sec) > SHOW WARNINGS; +---------+------+------------------------------------------------------------+ | Level | Code | Message | +---------+------+------------------------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xC3\xA9' for column 't' at row 1 | +---------+------+------------------------------------------------------------+ 1 row in set (0.00 sec) > INSERT IGNORE INTO test_jfg (n_uniq, t) values (4, NULL); Query OK, 1 row affected, 1 warning (0.00 sec) > SHOW WARNINGS; +---------+------+---------------------------+ | Level | Code | Message | +---------+------+---------------------------+ | Warning | 1048 | Column 't' cannot be null | +---------+------+---------------------------+ 1 row in set (0.00 sec) > INSERT IGNORE INTO test_jfg (t) values ('hello'); Query OK, 1 row affected, 1 warning (0.00 sec) > SHOW WARNINGS; +---------+------+---------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------+ | Warning | 1364 | Field 'n_uniq' doesn't have a default value | +---------+------+---------------------------------------------+ 1 row in set (0.00 sec) > INSERT IGNORE INTO test_jfg (t) values ('hello2'); Query OK, 0 rows affected, 2 warnings (0.00 sec) > SHOW WARNINGS; +---------+------+---------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------+ | Warning | 1364 | Field 'n_uniq' doesn't have a default value | | Warning | 1062 | Duplicate entry '0' for key 'n_uniq' | +---------+------+---------------------------------------------+ 2 rows in set (0.00 sec) > INSERT IGNORE INTO test_jfg (n_uniq) values ('16a'); Query OK, 1 row affected, 1 warning (0.00 sec) > SHOW WARNINGS; +---------+------+---------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------+ | Warning | 1265 | Data truncated for column 'n_uniq' at row 1 | +---------+------+---------------------------------------------+ 1 row in set (0.00 sec) > SELECT * FROM test_jfg; +----+--------+------------+ | id | n_uniq | t | +----+--------+------------+ | 1 | 1 | | | 4 | 127 | | | 6 | 2 | 123456789A | | 7 | 3 | Gagn? | | 8 | 4 | | | 9 | 0 | hello | | 11 | 16 | | +----+--------+------------+ 7 rows in set (0.00 sec)
So, if we do not look at warnings, we do not know that:
  • 1024 was truncated to 127 (because of SIGNED TINYINT)
  • 1025 was not inserted (because it was truncated which generated a duplicate entry)
  • 123456789ABCDEF was truncated to 123456789A (because VARCHAR(10))
  • 'Gagné' was replaced by 'Gagn?' (because of CHARACTER SET ASCII)
  • NULL was replaced by the empty string on n_uniq = 4 (because of t NOT NULL)
  • 'hello' was inserted with n_uniq = 0 (because no default)
  • 'hello2' was not inserted (because duplicate entry on no default)
  • and 16 was inserted instead of '16a' (because or type cast)
So what is the good way to fix our "Duplicate entry" warning here ?  My approach would be to ignore specific warnings (by code number): in this case, code #1062 should be ignored by the application.  However, this still needs a round-trip to the server (to call SHOW WARNINGS), which might be nice to avoid.  So I opened Bug#84557 and MDEV-11834 as a feature request to be able to Hide specific warnings from SHOW WARNINGS.

1 comment:

  1. WOW, that's a good article for sure and to take into account when upgrading mainly to 5.7, that will expose lots of differences between both 5.6 and 5.7. I recall some discussions about log_error_verbosity (5.7) to hide warnings and, IMHO as well, it shouldn't be used to hide the warning events because, as you said, the evil could be living right in there.

    ReplyDelete