DigitalFriend Blog

September 2007

Structured Data versus Free-text

Mon, 17 Sep 2007 17:37:09 +1000

By: gosh'at' (Steve Goschnick)

Its interesting to reflect on the history of both structured databases (predominently represented by SQL-oriented systems) and the current generation of tag languages (predominently XML). Prior to tag languages being in the opposite corner to structured database management systems (relational DBMS), the so-called free-text searching DBMSs filled that chair - represented by products on mainframes a generation ago with names like STAIRS and BASIS, upon which vast collections of textual information was stored, indexed and searched. These were the precursors to the Internet and to Google and the other search engines. Technically oriented libraries (the sort that hold books and have readers visit them) used the likes of STAIRS and BASIS to index and retrieve free-text documents - files with large text fields such as a 'whole magazine article'. These technical librarians and their information analyst associates and their needs led eventually to the first great tag language SGML, the mother-language of both HTML and XML, but more so XML. Users of both SGML and XML can create their own tags via a separate schema file, much as one does in the internal schema of a relational DBMS.

So, prior to XML we had two disparate but highly professional languages in the management of information realm, serving two distinct markets: SQL (for structured DBMS) and SGML (for free-text search DBMS). Then along came XML and in particular XML Schema the variant of XML schema language you use to define ones own tags, (rather than the earlier alternative schema language: XML DTD). XML together with XML Schema effectively represents a merging of the two previously divergent main streams of DBMS - both structured data and free-text. But do we always want or need a language that merges both structured and free-text data? Will it be the most efficient, or the most user-friendly way to use a language that merges structure data and free-text data? Granted, the free-text world certainly needed sorting out, and then SGML (via its derived children the HTML and XML tag languages) and Google and the like, have indeed sorted it out.

When you discard all of the technicalities It all comes down to sets (flat files) and hierarchies (tree structured files) really. Some things naturally require or use hierarchies - the sort of textual data in a book that is described by the index of that book, is the sort of information hierarchy that XML can easily accommodate. Some applications thrive on sets - regular, rectangular arrangements of data, with a fixed number of columns (attributes) and a variable number of rows (records) - for example the sheets in that great invention by the humble Dan Bricklin (co-inventor) - the spreadsheet.

Clearly, there are things that XML shines at and things that SQL shines at but often they are different things. In a few forthcoming blog entries here at I'll describe a few of them, and why we should consider backing the right horse for a chosen course...

[See the October 2007 post, for the release of the open source software: SQL+PaWS - SQL and People as Web Services]

Home | Site Map | Privacy Policy | Contact Us | ©2007 Solid Software Pty Ltd