General Overview of Multimaster Replication
Multimaster replication is a utility that allows data in multiple databases to be automatically kept
in sync. For example, in a multimaster replication system, if a row gets inserted into one of the
databases in the system, that row will be automatically propagated to all of the other databases in
that system. Updates and deletes to the data in any of the databases will be propagated in the
same way.
A multimaster replication environment is set up by configuring databases to be part of a
“replication group”. One of the databases in the group is defined as the “master definition site,”
and all of the other databases in the group are classified as “master sites.” The main difference
between the two types of sites is that most of the replication administration commands must be
invoked from the master definition site.
There are two basic ways that transactions get propagated to remote databases—“synchronously”
and “asynchronously”. Synchronous replication occurs by causing each transaction to be applied
to all the master sites in a group immediately. The way this is achieved is by using Oracle’s two-
phase commit functionality, to ensure that all of the databases in question can apply a given
transaction. If any of the sites in the group cannot accept the transaction (such as because the
site’s database has crashed, or the network connection to a database is down) then none of the
master sites in the replication group will be able to accept the transaction—the transaction will
not be able to take place.
The way asynchronous replication works is that all the transactions that occur on a site are
temporarily placed in a buffer, called the “deferred transaction queue,” or deftran queue.
Periodically, such as once per minute, all of the transactions in a site’s deftran queue get sent to
all of the other sites, by “push” jobs. These jobs get created by calling the “schedule_push”
procedure. Finally, the transactions in a deftran queue that have already been sent to other sites
must be periodically purged, to prevent the deftran queue from growing too large.
The vast majority of customer sites that use multimaster replication use asynchronous replication
rather than synchronous. One of the reasons for this is that asynchronous replication has been
available for a much longer time; the initial versions of multimaster replication only allowed for
asynchronous propagation. The main reason that asynchronous is used, though, is because it has
many advantages over synchronous.
First of all, asynchronous replication uses much less network bandwidth and provides higher
performance than synchronous replication. The primary reason for this is that it is more efficient
to store multiple transactions and then propagate them all as a group, rather than to propagate
each transaction separately.
This is particularly important when the sites in question are very far apart geographically (such as
having one site in San Francisco and another in New York). Another reason for these bandwidth
and performance improvements is that there is much more overhead associated with synchronous
replication because each and every transaction requires that separate connections be established to
all of the other sites in the replication group. With asynchronous replication, fewer connections
need to be established, since transactions are propagated as a group.
The biggest advantage of asynchronous replication, though, is that it provides for high availability
of the replication group. With asynchronous replication, if one of the sites in the replication group
crashes, all of the other sites will still be able to accept updates—the transactions that are made on
the remaining sites will just “stack up” in those sites’ deftran queues until the down site becomes
available.
On the other hand, with synchronous replication, if any one of the sites becomes unavailable
(such as because of a database crash or a network failure) then none of the sites will be updatable.
This is because with synchronous replication, each and every transaction must be able to be
immediately applied to all of the sites in the replication group, and of course if a site is
unreachable no transactions will be able to be applied to it. This means that not only does
synchronous replication not provide any higher database availability, it can actually provide
lower availability than using a single database!
No comments:
Post a Comment