The modern database era began in 1970, when E.F. Codd published his paper "A Relational
Model of Data for Large Shared Data Banks." His ideas enabled the logical manipulation of data
to be independent of its physical location, greatly simplifying the work of application developers.
Now we are poised for another leap forward. Databases will scale to gargantuan proportions,
span multiple locations and maintain information in heterogeneous formats. And they will be
autonomous and self-tuning. The major database vendors are pursuing these goals in different
Thirty years ago, IBM researcher Selinger invented "cost-based" query optimization, by
which searches against relational databases such as IBM's DB2 minimized computer resources by
finding the most efficient access methods and paths. Now Selinger is leading an effort at IBM
called Leo for Learning Optimizer that she says will push DB2 optimization into a new
Rather than optimizing a query once, when it's compiled, Leo will watch production queries
as they run and fine-tune them as it learns about data relationships and user needs. For example,
Leo would come to realize that a ZIP code can be associated with only one state, or that a Camry
is made only by Toyota, even if those rules aren't specified in advance.
Selinger says Leo will be most helpful in large and complex databases, and in databases
where interdata relationships exist but aren't explicitly declared by database designers. Leo is
likely to be included in commercial releases of DB2 in about three years, she says.
Whether the future of databases is the traditional, relational and SQL model with XML
technologies incorporated into it or a new XML-based model is a matter of debate. XML will
become the dominant format for data interchange with its flexibility and ability to provide
self-description, according to Don Chamberlin, a database technology researcher at IBM.
Relational databeses, he said, will be fitted with front ends to support XML and process
queries based on the XQuery standard. XML will become the "lingua franca" for exchange of
data. "We'll also see some large relational systems adapt to XML as a native format,"Chamberlin
said.Technologists are in the early stages of development of XML technologies. SQL will not go
away, but there are new data formats for which it just was not designed, he said.
Sun's Rick Cattell, a distinguished engineer at the company, had a less dominant outlook for
XML, saying very few people are going to store XQuery data in an XML format."I think the
momentum behind relational databases is insurmountable,"Cattell said, adding that he was
drawing on his experience with object-oriented databases, which were unable to unseat relational
databases in enterprise IT shops. Developers, Cattell said, will need tools to convert relational
data to XML and vice versa.
Currently, performance on the Web is hindered because of translations between Java and
XML data formats. Eventually, an extension of XQuery will replace both Java and SQL
according to some experts.
The next step in the evolution of databases is to provide a more powerful way to query them
than what is being done on search sites such as Google today.
Experts are expecting tuple space technology, which is intended to make it easier to store and
fetch data by recognizing patterns. And in-memory databases technology is a "no-brainer,"but there is
not enough memory available yet to accommodate it.
Microsoft Corp. says users will never be persuaded to dump everything e-mail, documents,
audio/video, pictures, spreadsheets and so on into one gigantic database. Therefore, the
software vendor is developing technology that will allow a user to seamlessly reach across
multiple, heterogeneous data stores with a single query.
Microsoft's Unified Data project involves three steps. First, the company will devise
"schema" based on XML that define data types. Then it will develop methods for relating
different data types to each other and finally develop a common query mechanism for distributed
databases. For example, I want to search for a document that references Microsoft, and the
document "tells" the query that there's also a media file in another place that references
The technology will appear in 18 months in SQL Server. It will be added to other Microsoft
products in ensuing years.
Oracle Corp. says its customers are moving toward data stores of huge size and complexity,
spread over multiple locations. The company says its products will not only evolve to handle
those kinds of jobs, but will also do them extraordinarily well. "Over the next couple of releases,
we'll see essentially fully autonomous databases," says Robert Shimp, vice president of database
Oracle also wants to facilitate collaboration for people in different companies with widely
varying information types."What doesn't exist today is the underlying infrastructure, or plumbing,
that's capable of managing all these diverse types of data,"Shimp says."What you need is the
ability to link all these clustered databases around the globe into a single, unified view for the
Elsewhere, researchers are finding that the best design for some database applications isn't a
traditional database at all, but rather data streams. Researchers at Stanford University are working
on ways that continuous flows of information—such as Web site hits, stock trades or
telecommunications traffic—can be passed through queries and then archived or discarded. A
query might, for example, be written to look continuously for suspicious patterns in network
traffic and then spit out an alert.
The problem in handling some kinds of problems with a traditional database management
system is one of timeliness, says Jennifer Widom, a computer science professor at Stanford. "If
you want to put a stream of data into a DBMS, you have to at some point stop, create a load file,
load the data and then query it,"she says. "Data stream queries are continuous; they just sit there
and give you new answers automatically."
Widom and her colleagues are developing algorithms for stream queries, and she says her
group will develop a comprehensive data stream management system. A prototype of such a
system will take a number of years to develop, and the underlying technology will then be either
licensed or offered as freeware, she says.
1, poise [pɔiz]
2, leap [li:p]
3, gargantuan [gɑ:'gæntjuən]
4, proportions [prə'pɔ:ʃəns]
5, insurmountable [,insə'mauntəbl]
6, momentum [məu'mentəm]
n. 势头；[物] 动量；动力；冲力
7, unseat [,ʌn'si:t]
8, persuade [pə'sweid]
9, gigantic [dʒai'gæntik]
10, heterogeneous [,hetərəu'dʒi:njəs]
adj. [化学] 多相的；异种的；[化学] 不均匀的；由不同成分形成的
11, ensue [in'sju:]
12, facilitate [fə'siliteit]
13, diverse [dai'və:s, di-]