Friday, September 21, 2012

General limitations of MySQL


General limitations of MySQL


32-bit binaries cannot address more than 4 Gbyte of memory. This is not a MySQL limitation, this is a technical limitation.
BLOB's are limited to 1 Gbyte in size even thought you use LONGBLOB because of a limitation in the MySQL protocol: The protocol limit for max_allowed_packet is 1GB.


Limitations of MySQL 4.1

Limitations of Joins

In MySQL 4.1, the maximum number of tables that can be referenced in a single join is 61. This also applies to the number of tables that can be referenced in the definition of a view.

Limitations of the MyISAM storage engine

There is a limitation of 232 (~4.2 Mia) rows in a MyISAM table. You can increase this limitation if you build MySQL with the --with-big-tables option then the row limitation is increased to 264 (1.8 * 1019) rows.


Limitations of MySQL 5.0

Limitations of Joins

The maximum number of tables that can be referenced in a single join is 61. This also applies to the number of tables that can be referenced in the definition of a view. This also applies to LEFT and RIGHT OUTER JOINS.

Limitations of the MyISAM storage engine

Large files up to 63-bit file length are supported.
There is a limitation of 264 (1.8 * 1019) rows in a MyISAM table.
The maximum number of indexes per MyISAM table is 64. You can configure the build by invoking configure with the --with-max-indexes=N option, where N is the maximum number of indexes to permit per MyISAM table. N must be less than or equal to 128.
The maximum number of columns per index is 16.
The maximum key length is 1000 bytes. This can be changed by changing the source and recompiling.


Limitations of the InnoDB storage engine

A table cannot contain more than 1000 columns.
The internal maximum key length is 3500 bytes, but MySQL itself restricts this to 1024 bytes.
The maximum row length, except for VARCHAR, BLOB and TEXT columns, is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes. LONGBLOB and LONGTEXT columns must be less than 4 Gbyte, and the total row length, including also BLOB and TEXT columns, must be less than 4 Gbyte.
Although InnoDB supports row sizes larger than 65535 internally, you cannot define a row containing VARCHAR columns with a combined size larger than 65535.
The maximum tablespace size is 4 Mia database pages (64 Tbyte). This is also the maximum size for a table.


Limitations of MySQL 5.1

Limitations of Joins

The maximum number of tables that can be referenced in a single join is 61. This also applies to the number of tables that can be referenced in the definition of a view.

Limitations of Partitions

The limitation of partitions with MySQL is 1024 (internal mail). But one have to increase open_files_limit. See also:


Limitations of MySQL Cluster

Max attributes/columns in an index: 32
Max number of attributes (columns and indexes) in a table: 128
Max number of table: 1792 (v5.0)
Max size in bytes of a row is 8052 byte, excluding blobs which are stored separately.
Max number of nodes in a cluster: 63, max. number of data nodes: 48 (in v5.0/5.1)
Max number of nodes in a cluster: 255 in CGE.
Max number of metadata objects: 20320.
Max attribute name length: 31 characters.
Max database + table name length: 122 characters.

Statement-based vs Row-based Replication

Statement-based vs Row-based Replication

Replication as most people know it, has mostly been SQL statement propagation from master to slave. This is known as “statement-based” replication. But there is also another kind of replication that is available, “the row-based replication” and that has quite a lot of benefits. In this post I intend on highlighting the advantages and disadvantages of both the types of replication to help you choose the best one. I also follow up with my own recommendation.
Let’s start off with discussing both the types.

Statement-based Replication

With statement-based replication, every SQL statement that could modify data is logged on the master. Then those SQL statements are replayed on the slaves against the same dataset and in the same context. The statement-based replication corresponds to the statement-based binary logging format.

Row-based Replication

With row-based replication, every “row modification” is logged on the master and is then applied on the slave. The keywords here are “row modification”, which implies that row-based replication is physical, in the sense that SQL statements that change the rows are not recorded, instead the entire updated row is written to the binary log. But there are a few exceptions, when a new table is created, dropped, or altered, the actual SQL statement is recorded. The row-based replication corresponds to row-based binary logging format.
Now that I have defined both the types of replication, let’s start having a look at the advantages and disadvantages of both the approaches.

Advantages of Statement-based Replication

Following are some of the advantages of statement-based replication. All of these advantages come down from the fact that the SQL statements are logged:
  • There is always less data that is to be transferred between the master and the slave.
  • There is less space taken up in the update logs.
  • There is no need to deal with the row format.
  • Also, auditing the database is easy, because statements that made any changes to the data are all logged in the binary log.

Disadvantages of Statement-based Replication

Following are some of the disadvantages of statement-based replication:
  • The single biggest disadvantage of statement-based replication is the data-inconsistency issue between the master and the slave that creeps up due to the way this kind of replication works. Because we are logging the SQL statements, it is also necessary to log context information, so that the updates produce the same results on the slave as they did originally on the master. But in some cases it is not possible to provide any such context. Any nondeterministic behavior, is not going to have any such context present and hence is difficult to replicate using statement-based replication.
    Let me quote an example here from the MySQL manual:
    “For example, for INSERT … SELECT with no ORDER BY, the SELECT may return rows in a different order (which results in a row having different ranks, hence getting a different number in the AUTO_INCREMENT column), depending on the choices made by the optimizers on the master and slave.”
  • With statement-based replication, you are bound to encounter issues with replicating stored routines or triggers, and hence this kind of replication does not always work with stored routines and triggers.
  • There is a performance penalty in the case of INSERT … SELECT, because in the case of statement-based replication this kind of statement requires a greater number of row-level locks as compared to row-based replication.
  • There is a lot of execution context information that is required in order for the updates to produce the same results on the slave as they did originally on the master.
  • A statement that depends on UDFs or stored routines that are nondeterministic, cannot be replicated properly, since the value returned by such a UDF or stored routine is not always the same, for similar parameters.

Advantages of Row-based Replication

Following are the major advantages of row-based replication:
  • With row-based replication, each and every change can be replicated and hence this is the safest form of replication.
  • Because every row update is physically logged, hence there no need for any context information. The only thing that is needed is to know which record is being updated and what is the update that is being written to that record.
  • There are fewer row locks required on the master, which thus achieves high concurrency.
  • The problems with auto_increment columns, timestamps, stored routines, and triggers don’t bother us with this kind of replication.
  • Statements that update very few rows are very fast.

Disadvantages of Row-based replication

Following are the disadvantages of row-based replication:
  • On a system that frequently updates large number of rows such as,
    UPDATE products set status='sold' where product_id BETWEEN 30000 and 50000;
    row-based replication produces very large update logs and generates a lot of network traffic between the master and the slave.
  • This kind of replication requires a lot of awareness of the internal row format.
  • In cases of very large updates, the performance overhead associated with the increased I/O required to write large update logs could become unacceptable.
  • You cannot examine the logs to audit changes to the database, because SQL statements are not logged, instead the data is logged in binary format.

Conclusion

Although both the types have advantages and disadvantages, but for me the advantage that row-based replication offers in terms of data consistency between master and slave, far outweighs any of the disadvantages. Though you might point out the fact that large updates produce large update logs in case of row-based replication, but those cases in real-world would be far and few and not very frequent. Also, stored routines and triggers are increasingly being used after their introduction in MySQL and row-based replication allows us to use them without thinking about them being unsafe for replication. Also everyone loves high-concurrency, don’t they and that is something that row-based replication achieves. And as far as auditing of data updates is concerned you could use “mysqlbinlog” to help you decode the binary logs and figure out the changes to data.
There are also other host of optimizations that are possible due to the way how this kind of replication works, which I will be discussing in a future post.
So my recommendation is go for row-based replication, you are going to love it!