Do I need replication? And if so, what
level of replication? Probably the most important aspect of scalability is replication.
For read-only applications, the challenge is easily met by making a copy of your database
on a separate server. Some tweaking of the application is required to make it point to one
of several database servers, but that's a relatively simple issue. (This is how database
servers for large portal sites, such as Yahoo, are set up.) Just make sure that updates
are applied at the same time to all your database servers, and you should be set. A more
elaborate setup for more complex applications would be to have a separate server for
reporting. In this case, the main database server would be used in a read-write fashion
through the application interface, and the reporting server would be read-only through the
reporting interface. If your functional requirements allow it, you can simply run a
nightly batch process to synchronize the reporting database. Users should be made aware
that they may not be getting real-time information from the reports. For real-time
synchronization of the databases, you'll need to use transactional replication, in which
each database transaction is duplicated in real time. Unlike the other two approaches --
multiple read-only databases and separate reporting servers -- this form of replication
will rely heavily on the built-in support of the underlying RDBMS (relational database
management system), as not all databases provide such functionality. Microsoft SQL Server,
Oracle and DB2 are some of those that support transactional replication, though the
implementation varies widely. Most public-domain databases, including MySQL and mSQL, do
not support it; check with your vendor before you take the plunge.
In this mode, insert, update and delete operations are applied seamlessly to multiple
servers. Whenever one of those operations is applied to a database, the server propagates
the same request over the network to the other servers and applies the action in real time
to the respective databases. At any given time, all data would look the same regardless of
the database server to which the application connects. Thus, a successful scalable
application depends on replication. The form you use depends on your specific needs and
budget.
Why do I need clusters? Clusters are used primarily for performance and fault
tolerance. Performance enhancements are achieved via load-balancing across multiple
servers in the cluster or by pooling the cluster's resources. Fault tolerance is achieved
via failover. A cluster of nodes can enhance performance considerably, but the application
software must be aware of the clustering and make use of it. Luckily, most middle-tier
application servers, such as Apple Computer's WebObjects, BEA Systems' WebLogic and IBM's
WebSphere, relieve you of this chore and manage both failover and load-balancing.
At the database level, clusters are represented by one of two software models --
shared-disk and shared-nothing. A shared-disk cluster has several servers sharing the same
physical disks and data but not memory. A software layer manages locking the data to
serialize access to the hard disks. These clusters can be complex and expensive, but they
require little intervention from the DBAs or programmers. A shared-nothing cluster,
meanwhile, is basically a network of separate servers, each holding a portion of the data.
For example, a table may be partitioned on primary keys, with each database server
holding a partition of the table based on the primary keys. A distributed SQL view is then
created, reflecting the aggregate data across multiple physical servers and shielding the
application from the underlying complexity. The DBA must set up the various database
servers and handle the configuration and sizing of those servers.
What about fault tolerance? A failover cluster uses some form of replication to keep a
relatively up-to-date copy of the data on a separate server. Replication and failover are
usually managed by a low-level software layer that works very closely with the operating
system in a fashion that's transparent to the application servers. All major operating
systems (AIX, HP-UX, Solaris, VMS and Windows NT) support failover via vendor-supplied
failover software.
The failover machine does not participate in the application functionality, but once
disaster strikes, it takes over and becomes the main database server. A transaction log
from the main server is then transported to the failover server, and the log is applied to
the failover database to bring it up to date. Your DBA should ensure that such a log
exists, and that it is accessible even when the server holding the database and the logs
crashes. And that brings us to our next point.
Do I need any special backup procedures? Definitely. Backups are the most basic
requirement of all successful software deployments, regardless of scalability. The twist
here is that in most scalable environments, applications can't be taken offline for backup
procedures, so backups are performed online on production databases. However, not all
database systems support online backups. Those that do generally require backing up redo
and transaction log files, because backups of open database files aren't completely
reliable. In contrast, an online backup in combination with a redo log file preserves an
exact image of the data. This issue comes into play when the DBA is required to perform a
recovery of a failed system. Not only will you want a restored database from a backup tape
(or another device), but the redo log files will also have to be restored and then applied
to the database. Keep in mind that most databases do not archive redo log files before
they're periodically overwritten. Backing up the files without configuring the database
system to archive them may be of little use.
Armed with this information, you should be able to engage in a smooth, productive
relationship with your DBA and constructively enlist that person's help in scaling your
e-business.
Ahmad Abualsamid is the founder of Apical Consulting, a Chicago-based software
consulting and contract programming firm. Send your comments on this article to him at
ahmad@apicalconsulting.com.