Wednesday, December 19, 2018

MySQL InnoDB lost tables but files exist

PROBLEM
=====================================================================
I have a MySQL InnoDB which has all the database table files, but MySQL doesn't see them, and isn't loading them.
The problem happened because I deleted these three files: ibdata1ib_logfile0 and ib_logfile1
because I was having issues with mysql starting up, and what I read was to remove them because MySQL will just regenerate them (I know I should have backed them up but didn't).
What can I do to get MySQL to see the tables again?

FIX:
======================================================================
Here is why MySQL cannot see those files: The system tablespace (ibdata1) has a Storage-Engine specific data dictionary that lets InnoDB map out potential table usage:
InnoDB Architecture
Moving InnoDB tables from one place to another requires commands like
ALTER TABLE tblname DISCARD TABLESPACE;
ALTER TABLE tblname IMPORT TABLESPACE;
Here is a part of the MySQL 5.5 Documentation explaining what needs to be considered
Portability Considerations for .ibd Files
You cannot freely move .ibd files between database directories as you can with MyISAM table files. The table definition stored in the InnoDB shared tablespace includes the database name. The transaction IDs and log sequence numbers stored in the tablespace files also differ between databases.
To move an .ibd file and the associated table from one database to another, use a RENAME TABLE statement:
RENAME TABLE db1.tbl_name TO db2.tbl_name; If you have a “clean” backup of an .ibd file, you can restore it to the MySQL installation from which it originated as follows:
The table must not have been dropped or truncated since you copied the .ibd file, because doing so changes the table ID stored inside the tablespace.
Issue this ALTER TABLE statement to delete the current .ibd file:
ALTER TABLE tbl_name DISCARD TABLESPACE; Copy the backup .ibd file to the proper database directory.
Issue this ALTER TABLE statement to tell InnoDB to use the new .ibd file for the table:
ALTER TABLE tbl_name IMPORT TABLESPACE; In this context, a “clean” .ibd file backup is one for which the following requirements are satisfied:
There are no uncommitted modifications by transactions in the .ibd file.
There are no unmerged insert buffer entries in the .ibd file.
Purge has removed all delete-marked index records from the .ibd file.
mysqld has flushed all modified pages of the .ibd file from the buffer pool to the file.
Given these caveats and protocols, here is a suggested course of action
For this example, let's try to restore the tags table to the mydb database

STEP #1

Make sure you have backups of those .frm and .ibd files in /tmp/innodb_data

STEP #2

Get the CREATE TABLE tags statement and execute it as CREATE TABLE mydb.tags .... Make sure it is the exact same structure as the original tags.frm

STEP #3

Delete the empty tags.ibd using MySQL
ALTER TABLE mydb.tags DISCARD TABLESPACE;

STEP #4

Bring in the backup copy of tags.ibd
cd /var/lib/mysql/mydb
cp /tmp/innodb_data.tags.ibd .
chown mysql:mysql tags.ibd

STEP #5

Add tags table to the InnoDB Data Dictionary
ALTER TABLE mydb.tags IMPORT TABLESPACE;

STEP 6

Test the table's accessibility
SHOW CREATE TABLE mydb.tags\G
SELECT * FROM mydb.tags LIMIT 10;
If you get normal results, congratulations you import an InnoDB table.

STEP 7

In the future, please don't delete ibdata1 and its logs



ALT.
UPDATED
First of all, about the files:
  • .frm - table structure;
  • .myd - table data;
  • .myi - indexes.
To recover tables, you can try (make backup first):
1) run check table tablename - for all db tables;
2) run repair table tablename - for necessary tables.

Monday, December 17, 2018

MySQL Performance: MyISAM vs InnoDB

A major factor in database performance is the storage engine used by the database, and more specifically, its tables. Different storage engines provide better performance in one situation over another. For general use, there are two contenders to be considered. These are MyISAM, which is the default MySQL storage engine, or InnoDB, which is an alternative engine built-in to MySQL intended for high-performance databases. Before we can understand the difference between the two storage engines, we need to understand the term “locking.”

What is locking in MySQL?

To protect the integrity of the data stored within databases, MySQL employs locking. Locking, simply put, means protecting data from being accessed. When a lock is applied, the data cannot be modified except by the query that initiated the lock. Locking is a necessary component to ensure the accuracy of the stored information.  Each storage engine has a different method of locking used. Depending on your data and query practices, one engine can outperform another. In this series, we will look at the two most common types of locking employed by our two storage engines.

Table locking:  The technique of locking an entire table when one or more cells within the table need to be updated or deleted. Table locking is the default method employed by the default storage engine, MyISAM.
Example: MyISAM Table LockingColumn AColumn BColumn C
Query 1 UPDATERow 1Writingdatadata
Query 2 SELECT (Wait)Row 2datadatadata
Query 3 UPDATE (Wait)Row 3datadatadata
Query 4 SELECT (Wait)Row 4datadatadata
Query 5 SELECT (Wait)Row 5datadatadata
The example illustrates how a single write operation locks the entire table causing other queries to wait for the UPDATE query finish.

Row-level locking: The act of locking an effective range of rows in a table while one or more cells within the range are modified or deleted. Row-level locking is the method used by the InnoDB storage engine and is intended for high-performance databases.
Example: InnoDB Row-Level LockingColumn AColumn AColumn A
Query 1 UPDATERow 1Writingdatadata
Query 2 SELECTRow 2Readingdatadata
Query 3 UPDATERow 3dataWritingdata
Query 4 SELECTRow 4ReadingReadingReading
Query 5 SELECTRow 5ReadingdataReading
The example shows how using row-level locking allows for multiple queries to run on individual rows by locking only the rows being updated instead of the entire table.

MyISAM vs. InnoDB

By comparing the two storage engines, we get to the crux of the argument between using InnoDB over MyISAM. An application or website that has a frequently used table works exceptionally well using the InnoDB storage engine by resolving table-locking bottlenecks. However, the question of using one over the other is a subjective as neither of them is perfect in all situations. There are strengths and limitations to both storage engines. Intimate knowledge of the database structure and query practices is critical for selecting the best storage engine for your tables.
MyISAM will out-perform InnoDB on large tables that require vastly more read activity versus write activity. MyISAM’s readabilities outshine InnoDB because locking the entire table is quicker than figuring out which rows are locked in the table. The more information in the table, the more time it takes InnoDB to figure out which ones are not accessible. If your application relies on huge tables that do not change data frequently, then MyISAM will out-perform InnoDB.  Conversely, InnoDB outperforms MyISAM when data within the table changes frequently. Table changes write data more than reading data per second. In these situations, InnoDB can keep up with large amounts of requests easier than locking the entire table for each one.

MyISAM:
The MyISAM storage engine in MySQL.
  • Simpler to design and create, thus better for beginners. No worries about the foreign relationships between tables.
  • Faster than InnoDB on the whole as a result of the simpler structure thus much less costs of server resources. -- Mostly no longer true.
  • Full-text indexing. -- InnoDB has it now
  • Especially good for read-intensive (select) tables. -- Mostly no longer true.
  • Disk footprint is 2x-3x less than InnoDB's. -- As of Version 5.7, this is perhaps the only real advantage of MyISAM.
InnoDB:
The InnoDB storage engine in MySQL.
  • Support for transactions (giving you support for the ACID property).
  • Row-level locking. Having a more fine grained locking-mechanism gives you higher concurrency compared to, for instance, MyISAM.
  • Foreign key constraints. Allowing you to let the database ensure the integrity of the state of the database, and the relationships between tables.
  • InnoDB is more resistant to table corruption than MyISAM.
  • Support for large buffer pool for both data and indexes. MyISAM key buffer is only for indexes.
  • MyISAM is stagnant; all future enhancements will be in InnoDB. This was made abundantly clear with the roll out of Version 8.0.
MyISAM Limitations:
  • No foreign keys and cascading deletes/updates
  • No transactional integrity (ACID compliance)
  • No rollback abilities
  • 4,284,867,296 row limit (2^32) -- This is old default. The configurable limit (for many versions) has been 2**56 bytes.
  • Maximum of 64 indexes per table
InnoDB Limitations:
  • No full text indexing (Below-5.6 mysql version)
  • Cannot be compressed for fast, read-only (5.5.14 introduced ROW_FORMAT=COMPRESSED)
  • You cannot repair an InnoDB table

Friday, November 30, 2018

Mysql Replication Crash Safe Slaves Replication Feature introduced in 5.6

Mysql Replication Crash Safe Slaves Replication Feature introduced in 5.6

What to fix:
2014-07-21 06:29:58 3850 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
Current Settings
root@server1> select table_name,engine from information_schema.tables
    -> where table_name in ('slave_relay_log_info','slave_master_info');
+----------------------+--------+
| table_name           | engine |
+----------------------+--------+
| slave_master_info    | InnoDB |
| slave_relay_log_info | InnoDB |
+----------------------+--------+
2 rows in set (0.00 sec)

root@server1> show variables like '%info_repository';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| master_info_repository    | FILE  |
| relay_log_info_repository | FILE  |
+---------------------------+-------+
2 rows in set (0.00 sec)

Add 2 lines to server 1/2 my.cnf

master_info_repository=TABLE
relay_log_info_repository=TABLE

Restart mysqld
Verify the Setting
root@server1> show variables like '%info_repository';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| master_info_repository    | TABLE |
| relay_log_info_repository | TABLE |
+---------------------------+-------+
2 rows in set (0.00 sec)

root@server1> select * from mysql.slave_master_info \G
*************************** 1. row ***************************
       Number_of_lines: 23
       Master_log_name: bin-log.000008
        Master_log_pos: 120
                  Host: 127.0.0.1
             User_name: repl_user
         User_password: repl_pass
                  Port: 3307
         Connect_retry: 60
           Enabled_ssl: 0
                Ssl_ca:
            Ssl_capath:
              Ssl_cert:
            Ssl_cipher:
               Ssl_key:
Ssl_verify_server_cert: 0
             Heartbeat: 1800
                  Bind:
    Ignored_server_ids: 0
                  Uuid: 37f99013-0fa4-11e4-ae2f-e13c88f43118
           Retry_count: 86400
               Ssl_crl:
           Ssl_crlpath:
Enabled_auto_position: 0
1 row in set (0.00 sec)

root@server1> select * from mysql.slave_relay_log_info \G
*************************** 1. row ***************************
  Number_of_lines: 7
   Relay_log_name: /mysql56/mysql-server1/slave/relay-bin.000021
    Relay_log_pos: 281
  Master_log_name: bin-log.000008
   Master_log_pos: 120
        Sql_delay: 0
Number_of_workers: 0
               Id: 1
1 row in set (0.00 sec)

Friday, November 9, 2018

[Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT

[Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. CREATE... IGNORE SELECT is unsafe because the order in which rows are retrieved by the SELECT determines which (if any) rows are ignored. This order cannot be predicted and may differ on master and the slave. Statement: create temporary table IF NOT EXISTS vt_tmp_u1098(id int(11) primary key) ignore (SELECT 1098 as id) UNION (SELECT vtiger_user2role.userid AS userid FROM vtiger_user2role INNER JOIN vtiger_users ON vtiger_users.id=vtiger_user2role.userid INNER JOIN vtiger_role ON vtiger_role.roleid=vtiger_user2role.roleid WHERE vtiger_role.parentrole like 'H1::H2::H31::H32::%') UNION (SELECT groupid FROM vtiger_groups where groupid in (777,778))


23:17:39' /* user:maintenance/update_mysql_from_logs.pl bt:main,486,445,364,164,79 */

2018-11-09T16:42:03.193203-08:00 8747303 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. INSERT... ON DUPLICATE KEY UPDATE  on a table with more than one UNIQUE KEY is unsafe Statement: INSERT INTO demographics (pwsid, active, sex, bday, bday2, lat, lon, lastlogin, handle, race, body_type, sex_orient, marital, state, country, cupid_sex) VALUES ('2155832_48095', '1', '1', '1954-02-01', '1900-00-00', '25.7979', '-80.3071', '2018-11-09 16:41:16', 'e409', '4', '1', '0', '0', 'Florida', 'United States', '2') ON DUPLICATE KEY UPDATE active = '1',lastlogin = '2018-11-09 16:41:16' /* user:maintenance/update_mysql_from_logs.pl bt:main,486,445,364,164,79 */



The error message it gives you is pretty good actually. Basically the statement that it is warning you about could potentially end up with different results when it runs on the slave versus when it ran on the master. This is something that a tool like Percona's pt-table-checksum (see below link) can help with to detect what is called slave drift or data drift, which is when the slave has different data than the master.

What to actually do about this (other than check your tables with the tool I mentioned) depends on the situation. The query it is warning you about may end up working perfectly fine, but you would not know for sure without checking, and even then it may work sometimes and not others.

The best option in my opinion is to switch to row based replication to avoid this, as then MySQL is replicating the data itself and not the statement. Row based replication has some downsides, mainly that more data is transferred, so slave lag may be more of an issue in extreme cases, so you'll want to read up on that first before making any changes.

The other option would be to re-write the query to avoid any functions / formatting that results in unsafe queries for statement best replication


Mixed format does pretty much what it sounds like, but I've never used it myself and have not seen many other people using it. So that choice is up to you, but I would make sure to research / test it first.

MIXED mode is a good compromise between statement and ROW, and will work for the statement you had in warning message. Basically, for any data update that is "unsafe" from replication point of view, MySQL will log all changed rows values instead of the original statement. But it will keep logging other updates the same way as in statement mode.

In "MIXED" mode, replicated transactions are written in either of two formats: row based logging, or statement based login. MySQL decides to use statement or row based log entry if it considers a given statement is "safe" or not. So "safe" updates will be logged just like in the STATEMENT binlog format, and "not-safe" will be logged as row-based events. This way the MIXED mode is more safe then STATEMENT from data integrity point of view. It's just a compromise between pure STATEMENT or pure ROW modes.
Now, whether to use pure ROW format, we can argue, as there are many pros and cons of this solution

NOTE:
The binlog_format variable is dynamic, so you may change it at any point. The cleanest way would be to set it in your my.cnf on the master and restart the server, otherwise you can stop replication on your slave, change the setting dynamically on the master, and then start replication on your slave so that it picks up the change since a change in global variables only applies to new connections.



Secondly to just switch to "ROW" based replication, I can see that all i need to do is specify the command globally for all. 
like this :- mysql> SET GLOBAL binlog_format = 'ROW';

Friday, November 2, 2018

SQL Skills Missing in Developers

Database Design and Support Skills

The biggest assumption is that just because a developer knows SQL that they have to skills to perform database design, data profiling or even data querying. Here are my thoughts on typical skills needed in system development:

Database Modelling Skills

Mission-critical databases should be designed by data architects or at least trained data modellers. Many clients I have worked for used their application developers to model and build their databases. Database design is a skill. The just-build-it approach (just add a table or a column on the fly) is not proper database development – it is a data slum.

Database Design Skills

Too many database are being designed by developers in an ad-hoc approach – missing a table, just add it; missing a column, just add it.  Designing databases is not about creating SQL.  The designer must know what makes a good model and must be knowledgeable and keen on data management best practises.  In short, good database design requires data architecture experience.
Moreover, it is about asking the right questions to come up with a design that will meet the business needs, and be performant and resilient to business change.  One should always use a data modelling tool to design a database and generate the SQL needed to create/modify the database – it should never be hand-crafted.
  • Avoid hard-coded SQL statements in application programs that can be used by developers to override expected system logic.
  • Design Referential Integrity (RI) into the model so that the database automatically creates the constraints to enforce RI.
  • Build all business rules into the tables thereby minimizing and preferably eliminating the hard-coded rules in the applications.

Data Profiling Skills

Just because a developer may know SQL does not mean that s/he knows how to perform data profiling to find all frequent data issues and determine their causes. S/he needs to have keen powers of observation and an unwillingness to ignore data issues.

Data Discovery Skills

Just because a developer may know SQL does not mean that s/he knows how to find patterns in the data and interpret their meaning. S/he may not understand the relationships between the tables.

Query Design Skills

It is not enough for a developer to know SQL to design queries and reports. S/he may not understand the business needs so s/he cannot possibly design the SQL queries needed to produce the outputs.

Database Indexing Knowledge

Just because a developer knows SQL does not mean s/he knows proper SQL construction. The following examples of an actual study showed that many query performance problems were caused by a lack of SQL knowledge, poor design or poor SQL coding.
  • Ensure criteria in a WHERE clause are indexed in the database;
  • Avoid an equation on left side of a WHERE clause, such as WHERE Amount + X < 5000;
  • Avoid WHERE clauses that contain columns inside an equation such as WHERE account_name = bestValue (acc_name, account_name).

These kind of queries cannot use indexing and may result in a full table scan and thus very poor performance.

What Makes Good Primary Key?

What Makes Good Primary Key?

Ctry-Ctry Region
The following question was recently asked on Quora:
How do I choose the primary key when there are two unique identifiers?
I disagree that choosing an integer over a character field is the best answer for a Primary Key (PK). I am providing a synopsis of the standards I defined in my Data Architecture Handbook and Data Management Framework. There is additional detail and diagrams under the section Data Design Considerations below.

Data Type

When I first started my 35-year+ career as a data architect, it was true that integer fields were more performant than character.  This has not been true for a while yet the belief still persists.
When I was building a major (multi-terabyte) Enterprise Data Warehouse (EDW) for a law enforcement system, I actually had to prove this to my DBAs via an extensive test and the difference was nano-seconds. What was also proved is that fixed length character is superior to variable-length character.  Names and descriptions are not good candidates for a PK for many reasons.

Key Length

Length not data type has an impact on index performance.

Composite Keys

However, my DBs and EDWs often use composite PKs consisting of multiple columns and there is an insignificant impact on performance. There is also the extra benefit of being able to select by the highest order columns without having to know the exact value you are looking for in the lower level columns; e.g. find all country-subdivisions (state/province/canton…) for a selected country.
Using a Composite Key is far better than using Junction Tables (see Figure 2 below) in between every two related parent and child tables. Increasing the number of tables joined in queries has a definite performance hit and SQL coding complexity.

Two Unique Identifiers

So back to the original question of which column to choose.  Generally, having two columns that uniquely identify each row is against Third Normal Form (3NF) Rules. 3NF is the basis on which relational databases were originally designed by their founder, Codd.
First you need to identify is what is the origin and purpose of each column.  Why was a second unique key added to the table?  I will give you one example I have come across a lot.  Many governmental departments used to create a table of Countries identified by an arbitrary integer number.  At a later date they added the ISO two-character Country Code.  In this case, the two-character code is superior to the four or eight-byte integer. But not for the reason of length.

Reference Code Tables

The point of ISO and other international standards is to provide a set of code values that are independent of language and can be exchanged between all countries and companies.  Therefore, it is the ISO, IATA, ITU, United Nations, Microsoft… code that MUST be the primary key and must be exchanged between organizations.  Therefore, all Reference Code Tables should use the code values defined by the international or industry-specific standards (for instance, Microsoft defined the Language codes).

Operational Data Tables

This is where a judgement call needs to be made.  Look at the purpose of each column and choose the one that makes the most sense for your business.  So, for a Purchase Order table one would want to use the PO Number as defined by the business.  Generally, these keys are integer but not always.  For instance, the General Ledger usually looks like a number but is actually defined as character by most financial accounting systems.

Data Design Considerations

Primary Key Choices

Natural Key

A natural key is formed from attributes that already exist in the real world.[2]  Especially if it is alphabetic, it can indicate the meaning of the data held in a row.  An example in the figure below is Country, where a mnemonic acronym defined by ISO is used to identify each country.  The two-character ISO Country is pervasive as it is used by the banking, travel, financial, foreign exchange (prefix for the ISO Currency Code), and internet industries (used in web domains).
While some of these attributes will be natural keys, many are auto-incrementing integer numbers such as Purchase Order Number, Transaction Id, Contract Id, Invoice Number, and Customer Id.

Surrogate Key

Surrogate Key is a unique identifier for an object or row in a table and is frequently a sequential auto-number.  It has no relationship to the real-world meaning of the data held in a row.  Surrogate Key is essential for EDWs, which integrate data into a table from multiple data sources.  One cannot use the key defined in the source system as it may conflict, be the same as a key in another system and/or may be changed by the source system.

Data Modelling

If a database has been designed properly there is no problem changing a primary key. The problem is that many developers with limited data architecture experience are still designing databases.  In many instances, databases are not developed with industry-strength Data Modelling (DM) tools.  If a DM tool is used, Referential Integrity business rules will be built right into the database. There will be no need to do your own “coding”.  The RI rules will take care of changes or deletions, and when they are allowed.

Composite Keys

When a PK consists of two or more columns it is called a composite primary key. A composite key is required when one table is dependent on another table. For instance, Country Region (province, state, canton, département…) is only unique within (i.e. dependent on) Country. As shown in the figure below, the Country Region Codes table has a primary key consisting of Country and Ctry Region.
Ctry-Ctry Region
Figure 1: Composite Primary Key

Junction Tables

This following figure is an example of introducing an extraneous table into the model/database.[1]  It provides no value-add and makes queries more complex and less performant due to the extra joins.  Junction tables lead to table proliferation.
Junction Table
Figure 2: Junction Table between every Parent & Child Table

Identifying or Dependent Relationship

Instead directly link the tables using an identifying or dependent relationship; in this case, phones belong to or are used by an entity.  A real plus is that a query can be made that finds all phones used by an entity.  You don’t have to know the phone numbers used by the entity.  This would be a single SELECT statement against a single table.
Identifying Table
Figure 3: Child Table with Identifying Relationship

SQL Coding


Poorly coded SQL has probably the most significant impact on query performance.