Wednesday, October 6, 2010

Solving problems with relational database

Data Independence

For a large enterprise, there is a very large body of crucial information. These are the “crown jewels” of the information technology part of the company. This information lasts for the whole lifetime of the enterprise. But applications come and go, like migrating birds. The next application to come along might want access data in a different way, for important reasons. The structure of the database structure must adapt well to these new and changing demands.
With the older styles of data organization (called “network” or “CODASYL”, roughly speaking), sometimes the new application could not be done efficiently. Many times, for all practical purposes, it was impossible to write the application with acceptable performance. You can find the details of this in many books, but to give just one analogy: suppose you have a program with nested loops. In many cases (not 2D arrays), it’s pretty obvious which loop ought to be on the outside. Well, imagine if you forced to do it the other way, even if it made the program very much slower. And that’s just one example.
To solve this, we want data organization that can do two things. First, give every application a view of the database that doesn’t change over time, so that the application keeps working. Second, have a way to change the physical organization of the data without changing any of the software that uses the database system, which may be needed to make the new applications faster without hurting the old ones, or not hurting enough that it matters much. This is called “data independence.”

The Relational Model

A novel and effective solution to data independence, the “relational”, was created by E. F. Codd, in 1970. By representing data in relations, in normalized form, you can solve both of the above problems. I won’t go over all that here; I recommend “An Introduction to Database Systems” by C. J. Date.
(By the way, notice that the name of the book isn’t “… to Relational Database Systems,” even though that’s what the book is. Why bother with a superlative adjective, when “everybody knows” that all database systems, other than ancient ones, are relational?)
The relational model, as an abstract concept, is an excellent and brilliant solution to the data independence problem. Later we’ll see that that is not the only problem for which people want to store data. But in the next post, I’ll look into how well actual relational database systems implement the concept.