Thursday, August 4, 2011

Filesystem in User space (FUSE) and CMS

One of the challenges that a serious, enterprise-level CMS will inevitably encounter is the need to publish content in a way that allows for the best possible performance on the final web page.

For "pure" content - that is, markup that is ready to be sent to the browser as-is, right out of the CMS - a well-optimized databse-retrieval system is probably quite sufficient. But for content that doesn't fall neatly into that category, other approaches need to be available. Consider that any (most?) websites these days would benefit from having publishable items that can then be run in an application space: things that require database or service interaction that cannot easily or effectively be extracted from that same database.

But most web-application languages provide *some* sort of inclusion mechanism: ASP.NET has its control mechanisms (though they are a bit more than "just" an inclusion), PHP has include and require functions, ColdFusion has CFINCLUDE, and JSP has some similar mechanism. Variations aside, these are just ways of telling the application server to include and execute some external file resource. In general, in order to execute such an item, it has to be something that can be read through the file-system. It cannot be text (or whatever) residing in a database. At best, it has to be read from that database, then explicitly executed - a process that can have security and/or performance implications.

In many cases, this can force separate content/executable storage and display processes and mechanisms - something that would be better avoided if at all possible.

It would certainly be possible to wire thing together so that each different content-item being published by a CMS would somehow "know" at publish-time where its content was coming from. But it would be better if there was one mechanism that would handle both simple text-retrieval (usually handled by calling to a database) and executable item-retrieval (file-system inclusion). There is one that is fairly obvious - writing every content-item out to its own file, and including each applicable item-file on a page-by-page basis. It works, but it can lead to synchronization issues if the CMS is generating content for many servers, or even on a single server if the underlying file(s) get locked.

So, ideally, a CMS needs the single-source capabilities of a database-retrieval process, and the automatic executability of a file-system inclusion. Not, typically, a mix supported by most CMS', and for reasons that would be obvious after some tinkering.

But Linux/Python offers a solution: a filesystem-in-user-space (FUSE) based "filesystem" attached to a database. In that scenario, each row in a given table or view essentially has a "file" associated with it automatically, as the row/file is created or altered, and that would vanish when its row is deleted. If the database driving the virtual file-system is peppy and/or has replication capabilities, the synchronization of items is as simple as an insert or update in the database that the virtual file-system is attached to, or replicating the database as needed to machines that other virtual filesystem instances are pointed at.

On the development side of the equation, this sort of set-up would also allow "normal" versioning systems (cvs, subversion, git, etc.) to work with database-managed content/files. A development-side file-system (that also points at a database) would look like a normal, native filesystem as far as the version-control system was concerned, so checkouts, branching/merging, and all the other version-control goodness that comes with a good version-control system would be available.

Some Fuse/Python resources:

No comments:

Post a Comment