Tuesday, February 26, 2008

Great new book for ESB users

I've recently had the opportunity to look over a yet-to-be-published book from Manning titled "Open Source ESBs in Action". If you're an ESB user, you owe it to yourself to check this one out as soon as it's available.

In an unusual twist, the authors of this book presented ESB best practices and techniques using *two* open source ESBs, Mule and ServiceMix. I'm already a Mule user, so I was pretty happy to see several new ideas to employ in my Mule use. I haven't tried ServiceMix yet, but may someday, time allowing. (I'm confident I can run several scenarios right out of the box, armed with this book.) The authors cover more than a few patterns straight out of "Patterns of Enterprise Integration". Kudos to Manning for allowing their authors to acknowledge this excellent resource, even if it comes from a different publisher. I really do appreciate that!

I liked several ideas I think I'll apply somewhere down the line-- including use of XML on Mule to facilitate validation and transformation, use of JibX to help make object-to-XML transformations, and use of a BPM product to oversee non-integration parts of my ESB. (I suspect that'll de-tangle the configuration a great deal, or at least make it easier to understand.)

By using two ESBs to implement every use case, I think the authors give you some insights you might otherwise have missed if you saw just one side of the story. As I said previously, unusual, but interesting.

I won't go on and on-- let's just suffice it to say this book will have a spot on my bookshelf when it becomes available in the near future. If you use an ESB, or think you might like to, you might consider looking at this informative work.

Wednesday, February 20, 2008

If it ain't got "Hello World", and it ain't got a cookbook...

It might as well not be downloadable. Open source projects need to be user-friendly to gain acceptance, and if they don't gain acceptance they're doomed to be buried by a rival that is. (Notable exception: if the project is alone in the workspace. Doesn't seem to be many of those around, though.)

Case in point-- the ESB market. Near as I can tell, Mule is running away with that game, largely because of it's ease-of-use model. Anyone reading the help docs can run about a half dozen excellent example configurations not long after downloading Mule. Compare that to some of the other ESBs out there-- they may or may not be able to compete on other merits, but they may never get the chance because they don't get a second look the initial few hours of frustration. Another exception is anything with enough critical mass that it's on everybody's short list sight unseen.)

Examples of doing it right:
Hibernate
Tomcat
Terracotta

Examples of not doing it right:
JBoss WS
Jess
Spring Batch

I'd love to see some of those 'not right' projects do better (especially Spring Batch). Let's see how they fare in the long run-- check back in a year or so.

Sunday, February 17, 2008

The Rules and Data shortcut-- doing things the ELT way

Lately I've been working on a very data-intensive project, one that needs to run hundreds of business rules against millions (maybe billions) of rows of data kept in a database. There's a constraint issue here-- our rule engine likes all the facts (data) in 'Working Memory' before it gives you the results you want, so we can't run the rules against *all* the data all at once. This means either serializing the process (which would take much too long, and would leave us vulnerable to changes to the database as time goes by) or distributing the task (which leaves us to work out recovery and restart schemes and looking for ways to manage timing issues, as we'll probably do this asynchronously.)

An idea has emerged, though, that might help us in many ways. If we resort to doing 'ELT' processing (Extract, Load, Transform-- *not* ETL), we can use the database to do our heavy lifting for us. A really great side-benefit is that we can do this with a small number of SQL statements instead of a whole bunch of data access classes and a bunch of business rules. We also get benefit of having the database state much less vulnerable to partially successful operations, as the SQL can easily be made transactional.

Here's a quick 'method 1 vs. method 2' comparison:

Method 1:
Determine what kind of data the rules require.
Pull the data (using Hibernate generated DAOs), put it in Working Memory.
Fire the Rules (which have to know the particulars of the Entity objects).
Put the rule results back in the database.
*Note: The data is 'pulled out' of the db, then the results are 'put back in'.

Method 2:
Using temporary tables and SQL, do as much of the rule work as possible in the database.
Do simple calculations, putting results in temporary tables that are created and destroyed as often as needed. Make small jumps in state from 'step' to 'step'. This will make more SQL, but simpler SQL.
After an 'End state' temp table is populated with the results of the rule evaluation, put the rule results back in the database. (In this case, an update OR insert, depending if we already had a result for some of the Entities under evaluation.) Some databases (including MySQL) give you convenience functions to handle inserts (if no row with this key) or update (if key exists).
*Note: The data never 'leaves' the database. The database doesn't necessarily work any harder (because it's doing fewer compute-sensitive large joins, selects, etc.) instead of a whole lot more run-of-the-mill DAO operations (gets).

Since the db is doing all the work, all we need is a single client to make the SQL requests instead of an army of distributed clients in Scenario #1. This makes for much simpler recovery and restart in the event something goes wrong (and fewer machines and instructions working, so fewer opportunities for things to go wrong.)

I'm much more of an application developer than a DBA, so this logic-in-SQL idea is not my first choice, but I think I like this ELT stuff. I'll just have to be careful to keep the SQL as simple as possible, as this seems to be the strongest negative that I can find in this situation.