| To: jon_udell@infoworld.com |
||||
The following is the text of my original letter sent Jan 7/2004 to Jon Udel at InfoWord (jon_udell@infoworld.com) in follow up to "A Tale of Two Cultures", Infoworld, Dec 31, 2003 http://www.infoworld.com/article/03/12/31/01OPstrategic_1.html Hello Jon - I sincerely believe that you could not be more right about the convergence of traditions between Unix scripting and Xerox/Apple/Microsoft windowing. The comment about Unix suffering from multiple legacy "mini-parsers' is well taken, even in the view of this Unix sympathizer. And as discussed later, I also subscribe to the notion that the GUI is a side effect of the application, not the other way around as treated in the first generation of commercial windowing systems basically left over from the 1980s. But I would suggest taking your analysis a bold step further to suggest that the change is much more profound than only the intersection of two cultures, but rather goes the core of what distributed systems mean and how they are built. What is happening is no less than a major aspect-orient interweaving of the three elements that have existed since von Neumann's 1946 paper pretty much defined the modern computer. While von Neumann et al introduced the now standard model of memory which freely intermixes program and data, very little use has been made of self modifying software and almost none commercially. For nearly fifty years data and executable code have been rigidly separated in all commonly accepted methodologies. One consequence of this separation is the notorious "impedance mis-match" between the currently dominant 00 programming model and the currently very dominant Relational Data Model. I believe your column is a piece of a bigger puzzle - the relationship between data and computation in networked environments. I would suggest there are three units from which any distributed system is made: the units of persistence, the units of computation, and the units of communication. Within the total system, all knowledge content represented, stored, manipulated and distributed must ultimately map to these three units and combinations of these three units. These basic units have always been mixed together to limited degrees, for instance program code in a file can be viewed as an executable unit (the object module after loading) contained in a unit of persistence (the file). Obviously the role of units of communication is to deliver the information content of other two types to wherever they need to go. In nearly all situations permitting modification of code, it is between levels, such as a compilers which mediate between source code and executable code. But internal modification with within a level is strictly forbidden. For example, the act of compilation generally does not result in modification of the compiler itself. Interpreters and code generators have much the same relationship between their inputs and outputs. But starting with HTML and accelerating with XML, units of communications, persistence, and computation have become intermingled on a massive scale in fairly radical new ways. It is standard fare now for webservers to take units of persistence such as .jsp and .asp files, compose and combine them with units of computation such as javascript for subsequent execution by the browser, and then transmit the resulting page as a single unit of communication to the browser. While HTML is rather limited to GUI functionality and Javascript is not very powerful in a general sense, there has been a distinct increase in the use of code generation on the fly that was previously encountered relatively rarely, and a very great increase in the delivery of units of communication which contain a mixture of data and computation. It is the last part that is particularly significant and your article touched on it. In many ways, the intentional combining of data and computation is a new paradigm (to use an over extended term). In traditional data processing, reaching its zenith with the Relation Data Model, the content of interest is oriented very heavily towards passive units of persistence. In traditional programming, reaching its zenith in OO design and programming, the emphasis is on units of communication that carry only data (message oriented "signature" parameters) to be delivered to combined units of computation and persistence ("objects" composed jointly of methods and attributes) which intentionally hide both their internal data and algorithms. While the Relational Data Model has no natural extension into the realm of computations (relational algebra et al aside), OO has trapped itself by two rigidities. First, the messages which an object accepts and emit tend to be very brittle in format, and the resulting protocols spoken between objects tends to very complex and usually not documented and often simply not understood. Capturing behavior adequately has always been a weak part of the OO paradigm. More fundamentally, however, OO made a deal with the devil by willfully accepting "information hiding". Most of the benefit of information hiding in OO is simply suppressing the details of the lower level(s) of implementation used to define the internals of the object. But that throws the baby out with bath water. The complete useful semantics of the object is almost certainly lost behind the facade of syntactic sugar provided by the method signatures and exposed attributes. Worst than that, this kind of unhealthy information hiding prevents what I believe will become the next dominant design in distributed system design and implementation: which is the ability of a computation dynamically to combine both data and "rules" (small units of computation) within itself and within ,other units on the fly, and then make use of the new combination(s) directly, or indirectly by using them as units of communication. The importance of dynamic composition (of data, computation, and communication) for addressing complex and evolving systems can hardly be overemphasized. As your article noted, keeping things "upstairs" and "downstairs" is hardly an advantage, but combining them opens up whole new vistas. This became apparent to me while analyzing a large billing system (100s of million of invoices per month) only to realized that the latest fad of "componentization" did not address and would not solve the core problem: the need to flow billing rules as well as data between major subsystems. My (proposed) solution was to treat contracts, user agreements, pricing plans and the like as XML documents that contained both parametric data and pricing rules, since we had proven previously that application specific scripting languages to define the rules were possible. The documents could thus be presented to humans via style sheets and the like, plus at the same time be used for system configuration and execution. From a computational perspective, the rules were combined on the fly from the multiple documents related by common account number and the rules could reference data in any of the associated documents. There was a framework for combining the rules into one virtual set of rules per transaction step that was part of the architecture of the application specific scripting language(s). Even in complex billing systems, the number of related documents needed per transaction step is only 3 or 4 at a time, and almost certainly less than 10 documents. Besides producing a simplified high level data model, this approach produced the architectural foundation of a dynamically assembling transaction system (or at least each step of the transaction). In fact, I believe this architectural pattern is applicable to the vast majority of commercial transaction systems. It could be argued that webservices define the units of communication and the transaction steps visible between computational and storage elements in the network. So ultimately both data and rules need to be accessible and open to manipulation. This certainly encourages very high level knowledge representations that are very close to the problem domain. The notion of "composition" within such a language becomes front and central. In the limit, this approach literally allows domain experts to "program" the system at the highest level using their own native notation. Of course, the enabling software and hardware to support this knowledge payload still requires a great deal of hard slogging. But application specific knowledge moves out these lower layers which become purely part of the infrastructure to support the high level applications. If your application logic is hidden inside of objects and buried within the coding of (low level) general purpose implementation languages, such as C/C++/Java/C#, or stuck in unfathomable data schemas, something has gone seriously wrong. If still desired, "information hiding" should be employed for reasons that inherently stem from the problem domain, generally involving security and privacy, and not from extraneous obscurities thrown in by the solution architecture. As a practical matter, at least for XML, the necessarily "hidden" parts can simply be encrypted in order to make them inaccessible to undesirable eyes. This still allows them to be used as units of persistence and units of communication. ("I don't know all of what I sent you, but my source says it's really good!") Well this got a little longer than expected, but I hope it was of some interest. And once again, good article. Thanks, Rob DuWors |
||||
| Subject: Re: Tale of Two Cultures |