Monday, August 22, 2011

Lesson for the day: Check your assumptions!

So I mentioned earlier that I had some concerns about the interrelationship of some of the classes I'd defined - enough so that I felt I needed to take some time and draw up a UML-ish diagram of them.

I don't know where I got the assumption that there was a problem, in retrospect. The concern I had was that the BaseFilesystemItemCache and IFilesystemItem classes (respectively, a nominal abstract class and a nominal interface) both needed to know about each other in order for a filesystem item to be deleted both as the object-instance representing the actual item, and within the item-cache that is being provided. That would require that each IFilesystemItem-derived instance needing to know what a BaseFilesystemItemCache is (in order to pass one to the constructor and be able to type-check it at construction), and be able to pass a "remove" call on to it's cache. At the same time, any BaseFilesystemItemCache instance would need to know what an IFilesystemItem is, in order to type-check that items being added conform to interface expectations.

Simplified, the relevant functionality would look something like this:

class BaseFilesystemItemCache( object ):
    def __init__( self ):
        if self.__class__ == BaseFilesystemItemCache:
            raise NotImplementedError( 'BaseFilesystemItemCache is nominally an abstract class, and should not be instantiated.' )
        self.Cache = []
    def Add( self, item ):
        if not isinstance( item, IFilesystemItem ):
            raise TypeError( '%s.Add error: Expected an instance implementing IFilesystemItem.' % ( self.__class__.__name__ ) )
        self.Cache.append( item )
        item.Cache = self
    def Remove( self, item ):
        if item in self.Cache:
            self.Cache.remove( item )

class IFilesystemItem( object ):
    def __init__( self, cache=None ):
        self.Cache = None
        if self.__class__ == IFilesystemItem:
            raise NotImplementedError( 'IFilesystemItem is nominally an interface, and should not be instantiated.' )
        if cache != None and not isinstance( cache, BaseFilesystemItemCache ):
            raise TypeError( '%s error: Expected a BaseFilesystemItemCache instance for cache' % ( self.__class__.__name__ ) )
        if cache:
            self.Cache = cache
    def __del__( self ):
        if self.Cache:
            self.Cache.Remove( self )

class TestCache( BaseFilesystemItemCache ):
    def __init__( self ):
        BaseFilesystemItemCache.__init__( self )

class TestItem( IFilesystemItem ):
    def __init__( self, cache=None ):
        IFilesystemItem.__init__( self, cache )

Given what I thought I knew, my expectation was that an error would be raised somewhere along the line, just because of the interpretation/compilation sequence that I thought Python used.

But this code runs (it doesn't do anything, but it raises no errors, either).

I added a few more lines to create a TestCache and a couple of TestItems:

cache = TestCache()
item1 = TestItem()
item2 = TestItem()

# At this point, nothing's been added to the cache:
print "Empty Cache"
print cache.Cache
print

print "Caching item2", item2
cache.Add( item2 )
print cache.Cache
print

print "Caching item1", item1
cache.Add( item1 )
print cache.Cache
print

print "Removing item2"
cache.Remove( item2 )
print cache.Cache

and it shows that the cache add/remove cycle works as I'd originally wanted:

Empty Cache
[]

Caching item2 __main__.TestItem object at 0xb769f2ac
[__main__.TestItem object at 0xb769f2ac]

Caching item1 __main__.TestItem object at 0xb769f28c
[__main__.TestItem object at 0xb769f2ac, __main__.TestItem object at 0xb769f28c]

Removing item2
[__main__.TestItem object at 0xb769f28c]

In retrospect, I think that I was thinking of object inheritance restrictions - a class cannot derive from another class that isn't earlier in the same file or external - though I'm really not certain how I came to the assumption I did.

So: Check your assumptions... Lesson learned...

Saturday, August 20, 2011

A JavaScript break - Custom Data Attribute use

OK, so this one isn't Python at all, but the idea intrigued me enough to hammer out some code, and see what I could come up with.

HTML5 (and possibly earlier versions of HTML, though browser support may be spotty) allows for the creation and use of custom data-attributes in page-markup. I agree with John Resig that it seems somewhat bizarre a thing to allow/provide, but when I'd first heard about them, I noted it and set it aside for future consideration.

Then, this week, I was tasked with rewriting/converting some of the legacy content and templating in Tridion at work. One of the main things that needed to be converted was the established (and functional, but not optimal) mechanism for generating ads on pages. The current mechanisms we have in place across the site are different from one application language to another (we have 2 different languages we're using at present), somewhat less than optimal (or so I gather), and not really conducive to the current stated goal(s) for the CMS.

So I started looking at ways that presentations for these ads (basically just snippets of JavaScript) could be generated in an HTML-friendly fashion (making them language- and platform-agnostic), relying only on the end-user's browser's capabilities. it occurred to me that if every ad on every page had some sort of surrounding element already anyway (a convention we'd established for CMS content early on), that if only there was some way to store the data for that ad as part of the markup being generated, it would probably be pretty easy to whack out a JavaScript that would go back through the page-DOM after the page was done loading, and populate those ads.

Something like:

<div data-adsource="[URL for ad-script]"></div>

...and there is a use for a custom data attribute...

Then, all that's left is to create some script that would facilitate reading those data-attribute values, and generate the "real" script-reference to pull them in. Here's what I came up with to actually read the data-attributes:

document.dataElements = null;
document.getDataElements = function( dataElementName )
{
  /*
    Returns a *non-distinct* list of all elements in the DOM that have a "data-[name]" attribute, or an empty list if none exist.

    dataElementName ...... [string, optional] The the complete data-attribute name that will be used for the search. May be supplied as a partial name, ending with "*" to wildcard-match multiple elements with similar data-names.
  */
  if( document.dataElements == null )
  {
    //  Build the document's collection of elements with "data-*" attributes first
    document.dataElements = {};
    allElements = document.getElementsByTagName( "*" );
    //  With each element...
    for( el=0; el < allElements.length; el++ )
    {
      element = allElements[ el ];
      //  check each attribute...
      for( at=0; at < element.attributes.length; at++ )
      {
        attribute = element.attributes.item( at );
        if( attribute.specified && attribute.name.indexOf( "data-" ) == 0 )
        {
          //  the element has a data-attribute, so add it to the document.dataElements collection
          if( document.dataElements[ attribute.name ] == null )
          {
            //  The array doesn't exist, so create it
            document.dataElements[ attribute.name ] = [];
          }
          document.dataElements[ attribute.name ][ document.dataElements[ attribute.name ].length ] = element;
        }
      }
    }
  }
  //  Get the data-elements
  if( dataElementName != null )
  {
    if ( dataElementName.substr( dataElementName.length-1, 1 ) != "*" )
    {
      return document.dataElements[ dataElementName ];
    }
    else
    {
      keyName = dataElementName.substr( 0, dataElementName.length-1 );
      results = new Array();
      for( dataName in document.dataElements )
      {
        if ( dataName.substr( 0, keyName.length ) == keyName )
        {
          results = results.concat( document.dataElements[ keyName ] );
        }
      }
      return results;
    }
  }
  else
  {
    return [];
  }
}

This provides a dataElements property on the document - a named collection of data-attribute names that each resolve to a (non-distinct) list of elements on the page that have that data-attribute. It also allows (as the comment in the script notes) a "wildcard" search and retrieval, for scenarios like "get every element on the page that has a data-attribute that starts with "data-ad" (document.getDataElements( "data-ad*" )).

Saturday, August 13, 2011

High-level DBFSpy UML diagram

Today I took a break from writing code, but decided that it would probably be a good idea to have at least a high-level UML diagram for DBFSpy's object-structure.

So here it is:

Click for larger view...
Though there is some detail available for a couple of the classes, I didn't want to try and arrange the entire thing with the available detail just yet... This at least shows the inheritance and implementation structure I'm anticipating, though.

A good part of the reason for my taking the time to draw this out was that I started working on the BaseFileSystemItemCache nominal abstract class, and it suddenly occurred to me that I wasn't sure how I was going to handle the potential circular dependencies between it and the IFilesystemItem nominal interface. The logic went something like this: When a filesystem item is deleted at the filesystem/database level, unless something is done to update (remove) that item from the cache, it will still be present. Fairly obvious. However, since Python won't allow circular class dependencies (each of these two objects needs to know about the other; one has to be defined and/or compiled before the other can use that definition), that poses a problem. Only one of the two classes can directly know that the item's been deleted, and without some way of connecting the two instances, I'd end up with either an invalid cache (which could be rebuilt, but that could be expensive), or an item that's still in memory, but that can't be accessed, wasting memory.

Observer pattern to the rescue, though, I think! Several basic structures have been documented on the web, so there's a handful of implementation-examples to choose from.

Friday, August 12, 2011

The BaseDataObject abstract class

See My Coding Style for explanation of, well, my coding style...

BaseDataObject is a nominal abstract class that is intended to provide a common mechanism for object state-persistence to and from a back-end database, where the object itself is responsible for keeping track of whether it's state needs to be saved. Like the BaseDatabaseConnection nominal abstract class, it is intended to be a building-block for other, more complicated objects, not an instantiable class in it's own right.

In and of itself, it will add the following members to derived objects:

Datasource (Property):
The BaseDatabaseConnection-derived object that the object will use to save it's state to the database.
Deleted (Property):
A flag indicating that the object's state-record has been flagged for deletion.
Dirty (Property):
A flag indicating that the object's state-data has been modified since it's initial creation or retrieval, and needs to be saved.
Id (Property):
An abstract property that provides the unique identifier in the database of the object's state-data record.
New (Property):
A flag indicating that the object's state-data has been created outside of the dataase, and needs to be saved.
_Create (Method):
An abstract method that inserts the object's state-data into the database. Not intended to be called directly (see Save, below).
_Delete (Method):
An abstract method that flags the object's existing state-data in the database as deleted or deletable. It may also perform an actual deletion if necessary at an object-by-object level. Not intended to be called directly (see Save, below).
Fetch (Static Method):
A method that allows easy retrieval of a single, identified object of the type from the database.
FetchAll (Static Method):
A method that allows easy retrieval of all objects of the type from the database.
Save (Method):
Saves the object's state-data according to it's Dirty, Deleted or New status
_Update (Method):
An abstract method that updates the object's existing state-data in the database. Not intended to be called directly (see Save, above).

Ultimately, BaseDataObject functionality is more or less intended to provide CRUD mechanisms on an object-by-object basis, as well as a way of retrieving all available instances.

It feels to me like the approach that I'm taking - specifically of providing a number of abstract classes to define common functionality or functional intents, at least, teeters on the edge of acceptability from a Single Responsibility Principle standpoint. "Responsibility" in this context feels a bit nebulous to me. For example, objects derived from BaseDataObject (and other definitions to come) may have a lot of functionality that originates from any number of external definitions, but all of that functionality still feels like it's part of that object's responsibilities: To represent [whatever], and be able to persist state-data for [whatever] makes the "responsibility" of a given object broader, but not... diffuse, maybe? It doesn't feel bad, at least not yet, so I'm going to continue down this path.

class BaseDataObject( object ):
    """Nominal abstract class, provides baseline functionality and interface requirements for objects whose state data is persisted in a back-end data-store."""

    ###########################
    # Class Attributes        #
    ###########################

    ###########################
    # Class Property Getters  #
    ###########################

    def _GetDatasource( self ):
        """Gets or sets the BaseDatabaseConnection object that will handle this object's database interaction.
Raises ConnectionChangeError if an attempt is made to change an already-set value.
Raises TypeError if the set value is not an object implementing BaseDatabaseConnection."""
        return self._datasource

    def _GetDeleted( self ):
        """Gets or sets the object's "deleted" flag, indicating that the object's state has been or should be flagged as deleted when it is next saved."""
        return self._deleted

    def _GetDirty( self ):
        """Gets or sets the object's "dirty" flag, indicating that the object's state has been changed since it's initial load."""
        return self._dirty

    def _GetId( self ):
        """Abstract property - Gets or sets the unique identifier of the record where the object's state is stored.
Implementations should not allow the value, once set, to be changed."""
        raise NotImplementedError( '%s.Id error: Id has not been implemented as defined by BaseDataObject.' % ( self.__class__.__name__ ) )

    def _GetNew( self ):
        """Gets or sets the object's "new" flag, indicating that the object's state has been created outside the database, and needs to be saved."""
        return self._new

    ###########################
    # Class Property Setters  #
    ###########################

    def _SetDatasource( self, value ):
        if self._datasource != None:
            raise ConnectionChangeError( '%s.Datasource error: Changes to the Datasource property after it has been set are not allowed.' % ( self.__class__.__name__ ) )
        if not isinstance( value, BaseDatabaseConnection ):
            raise TypeError( '%s.Datasource error: %s is not an object implementing BaseDatabaseConnection' % ( self.__class__.__name__, value ) )

    def _SetDeleted( self, value ):
        if type( value ) != types.BooleanType:
            raise TypeError( '%s.Dirty error: %s is not a boolean value' % ( self.__class__.__name__, value ) )
        self._deleted = value

    def _SetDirty( self, value ):
        if type( value ) != types.BooleanType:
            raise TypeError( '%s.Dirty error: %s is not a boolean value' % ( self.__class__.__name__, value ) )
        self._dirty = value

    def _SetId( self, value ):
        raise NotImplementedError( '%s.Id error: Id has not been implemented as defined by BaseDataObject.' % ( self.__class__.__name__ ) )

    def _SetNew( self, value ):
        if type( value ) != types.BooleanType:
            raise TypeError( '%s.New error: %s is not a boolean value' % ( self.__class__.__name__, value ) )
        self._new = value

    ###########################
    # Class Property Deleters #
    ###########################

    def _DelDatasource( self ):
        raise NotImplementedError( '%s.Datasource error: the Datasource property cannot be deleted.' % ( self.__class__.__name__ ) )

    def _DelDeleted( self ):
        raise NotImplementedError( '%s.Deleted error: the Deleted property cannot be deleted.' % ( self.__class__.__name__ ) )

    def _DelDirty( self ):
        raise NotImplementedError( '%s.Dirty error: the Dirty property cannot be deleted.' % ( self.__class__.__name__ ) )

    def __DelId( self ):
        raise NotImplementedError( '%s.Id error: the Id property cannot be deleted.' % ( self.__class__.__name__ ) )

    def __DelNew( self ):
        raise NotImplementedError( '%s.New error: the New property cannot be deleted.' % ( self.__class__.__name__ ) )

    ###########################
    # Class Properties        #
    ###########################

    Datasource = property( _GetDatasource, _SetDatasource, _DelDatasource, _GetDatasource.__doc__ )
    Deleted = property( _GetDeleted, _SetDeleted, _DelDeleted, _GetDeleted.__doc__ )
    Dirty = property( _GetDirty, _SetDirty, _DelDirty, _GetDirty.__doc__ )
    Id = property( _GetId, _SetId, _DelId, _GetId.__doc__ )
    New = property( _GetNew, _SetNew, _DelNew, _GetNew.__doc__ )

    ###########################
    # Object Constructor      #
    ###########################

    def __init__( self, datasource=None ):
        """Object constructor.

datasource ... [BaseDatabaseConnection instance, optional, default None] 
               The datasource (a BaseDatabaseConnection instance) that 
               the object will use to save it's state-data.

Raises NotImplementedError if an attempt is made to instantiate the class.
Raises TypeError if a datasource is supplied but does not implement BaseDatabaseConnection.
Raises RuntimeError if any other exception is raised by the creation process.
"""
        if self.__class__ == BaseDataObject:
            raise NotImplementedError( 'BaseDataObject is nominally an abstract class, and cannot be instantiated' )
        self._datasource = None
        self._deleted = False
        self._dirty = False
        self._new = True
        if datasource != None:
            try:
                self._SetDatasource( datasource )
            except TypeError, error:
                raise TypeError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
            except Exception, error:
                raise RuntimeError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )

    ###########################
    # Object Destructor       #
    ###########################

    def __del__( self ):
        """Object destructor. Assures that any changes to an object's state-data are saved before the object is destroyed."""
        if self._dirty or self._new:
            self.Save()

    ###########################
    # Class Methods           #
    ###########################

    def _Create( self ):
        """Abstract method - Inserts a state-record into the database, using whatever mechanism(s) are needed."""
        raise NotImplementedError( '%s._Create Error: _Create has not been overridden in from it\'s definition in BaseDataObject' % ( self.__class__.__name__ ) )

    def _Delete( self ):
        """Abstract method - Deletes the object's state-record from the database, using whatever mechanism(s) are needed."""
        raise NotImplementedError( '%s._Delete Error: _Delete has not been overridden in from it\'s definition in BaseDataObject' % ( self.__class__.__name__ ) )

    def Save( self ):
        """Saves the object's state to the database."""
        if self._deleted:
            self._Delete()
        if self._new:
            self._Create()
        if self._dirty:
            self._Update()

    def _Update( self ):
        """Abstract method - Updates the object's state-record in the database, using whatever mechanism(s) are needed."""
        raise NotImplementedError( '%s._Update Error: _Update has not been overridden in from it\'s definition in BaseDataObject' % ( self.__class__.__name__ ) )

    ###########################
    # Static Class Methods    #
    ###########################

    @staticmethod
    def Fetch( self, datasource, id ):
        """Returns a single instance of the object-type from the database.
datasource ... [BaseDatabaseConnection instance] The BaseDatabase-
               Connection-derived object that the method will use to 
               retrieve the specific instance.
id ........... The unique identifier for the instance to retrieve."""
        raise NotImplementedError( '%s.Fetch Error: Fetch has not been overridden from it\'s definition in BaseDataObject' % ( self.__class__.__name__ ) )

    @staticmethod
    def FetchAll( self, datasource ):
        """Returns a list of all available instances of the object-type from the database.
datasource ... [BaseDatabaseConnection instance] The BaseDatabase-
               Connection-derived object that the method will use to 
               retrieve all available objects."""
        raise NotImplementedError( '%s.FetchAll Error: FetchAll has not been overridden from it\'s definition in BaseDataObject' % ( self.__class__.__name__ ) )

__all__ += [ 'BaseDataObject' ]

Commentary

The provision of properties by an abstract class feels... risky, in the sense that it would not be difficult for someone (even myself) to, say, define their own Datasource property on a derived object that would break the functionality implied or provided. I don't know that I'm completely comfortable with trying to come up with ways to mitigate that risk in the code, however. Most of the approaches that I can think of feel like kludges, at best, and would likely require a lot more maintenance/upkeep as changes are inevitably needed and made.

At the same time, setting that concern aside, the approach feels fairly elegant to me - it keeps the implementation of specific interfaces restricted to as few places as I can manage (a good thing), and allows a lot of re-use (potentially - this project is pretty small).

Edited: Made Fetch and FetchAll into static methods - 8/22/11

Thursday, August 11, 2011

The BaseDatabaseConnection class

See My Coding Style for explanation of, well, my coding style...

BaseDatabaseConnection is a nominal abstract class that defines standard functionality for objects that can connect to and execute queries against a back-end database system (MySQL, PostgreSQL, etc.). As such, it has to encapsulate or provide the following items:
Connection (Property):
The current active connection to the back-end database;
Server (Property):
The location of the database server (machine-name or -address);
Database (Property):
The name of the database that the connection will be made to on the specified server;
User (Property):
The user-name that will be used to connect to the specified database;
Password (Property):
The password that will be used to connect to the specified database;
Execute (Method):
A mechanism that allows the execution of queries against the database-connection; and
Call (Method):
A mechanism that allows the execution of stored procedures against the database connection.
The separation of general query and stored-procedure call may not be needed in many cases, though the apparent leading Python/MySQL library (MySQLdb) provides separate mechanisms for each, presumably for a good reason. I'm going to assume that there's a good reason, at any rate, and maintain that separation.

The Execute and Call methods, assuming that they return anything from the database, will return a list of result-sets, each of which is in turn a list of row-dictionaries. Each of those will have a key/value pair where the key is the field-name, and the value is the value from the row for that field. Though creating another class to handle the result-sets, rows and fields/values is certainly possible, it feels a bit like sandblasting a soup-cracker for the purposes of the current project.

Perhaps doubly so, in that I'm intending to cache results in memory anyway (BaseFilesystemItemCache-derived objects will handle that).

class BaseDatabaseConnection( object ):
    """Nominal abstract class, provides baseline functionality and interface requirements for objects that can connect to and execute queries, procedures, and/or functions against a back-end data-source."""

    ###########################
    # Class Attributes        #
    ###########################

    _databaseNameCheck = re.compile( '[- \\.A-Za-z0-9]+' )
    _passwordCheck = re.compile( '[- \\.A-Za-z0-9]+' )
    _serverNameCheck = re.compile( '[- \\.A-Za-z0-9]+' )
    _userNameCheck = re.compile( '[- \\.A-Za-z0-9]+' )

    ###########################
    # Class Property Getters  #
    ###########################

    def _GetConnection( self ):
        """Abstract property - Gets a usable connection to the database using the object's connection properties.
Raises RuntimeError if the connection fails."""
        raise NotImplementedError( '%s.Connection error: Connection has not been implemented as defined by BaseDatabaseConnection.' % ( self.__class__.__name__ ) )

    def _GetDatabase( self ):
        """Gets or sets the name of the database that the connection will be made to.
Raises ConnectionChangeError if the property is changed after it has been set.
Raises TypeError if the property value being set is not a string.
Raises ValueError if the property value being set contains invalid characters (\\n, \\t, \\r).
"""
        return self._database

    def _GetPassword( self ):
        """Gets or sets the password for the user under whose authentication/authorization credentials the connection will run.
Raises TypeError if the value being set is not a string.
"""
        return self._server

    def _GetServer( self ):
        """Gets or sets the server that the connection will be made to.
Raises ConnectionChangeError if the property is changed after it has been set.
Raises TypeError if the property value being set is not a string.
"""
        return self._server

    def _GetUser( self ):
        """Gets or sets the user under whose authentication/authorization credentials the connection will run.
Raises TypeError if the value being set is not a string.
Raises ValueError if the property value being set contains invalid characters (line-breaking characters, \\t, anything outside the [ \\.a-zA-Z0-9] range).
"""
        return self._user

    ###########################
    # Class Property Setters  #
    ###########################

    def _SetDatabase( self, value ):
        if self._database != None:
            raise ConnectionChangeError( '%s.Database error: Changes to the Database property after it has been set are not allowed.' % (  ) )
        if type( value ) != types.StringType:
            raise TypeError( '%s.Database error: %s is not a string.' % ( self.__class__.__name__, value ) )
        checkedValue = self._databaseNameCheck.sub( '', value )
        if checkedValue != '':
            raise ValueError( '%s.Database error: "%s" contains invalid/illegal characters (%s).' % ( self.__class__.__name__, value, checkedValue ) )
        self._database = value

    def _SetPassword( self, value ):
        if type( value ) != types.StringType:
            raise TypeError( '%s.Password error: %s is not a string.' % ( self.__class__.__name__, value ) )
        checkedValue = self._passwordCheck.sub( '', value )
        if checkedValue != '':
            raise ValueError( '%s.Password error: "%s" contains invalid/illegal characters (%s).' % ( self.__class__.__name__, value, checkedValue ) )
        self._password = value

    def _SetServer( self, value ):
        if self._server != None:
            raise ConnectionChangeError( '%s.Server error: Changes to the Database property after it has been set are not allowed.' % ( self.__class__.__name)) ) )
        if type( value ) != types.StringType:
            raise TypeError( '%s.Server error: %s is not a string.' % ( self.__class__.__name__, value ) )
        checkedValue = self._serverNameCheck.sub( '', value )
        if checkedValue != '':
            raise ValueError( '%s.Server error: "%s" contains invalid/illegal characters (%s).' % ( self.__class__.__name__, value, checkedValue ) )
        self._server = value

    def _SetUser( self, value ):
        if type( value ) != types.StringType:
            raise TypeError( '%s.User error: %s is not a string.' % ( self.__class__.__name__, value ) )
        checkedValue = self._userNameCheck.sub( '', value )
        if checkedValue != '':
            raise ValueError( '%s.User error: "%s" contains invalid/illegal characters (%s).' % ( self.__class__.__name__, value, checkedValue ) )
        self._server = value

    ###########################
    # Class Property Deleters #
    ###########################

    def _DelDatabase( self ):
        raise NotImplementedError( '%s.Database error: the Database property cannot be deleted.' % ( self.__class__.__name__ ) )

    def _DelPassword( self ):
        raise NotImplementedError( '%s.Password error: the Password property cannot be deleted.' % ( self.__class__.__name__ ) )

    def _DelServer( self ):
        raise NotImplementedError( '%s.Server error: the Server property cannot be deleted.' % ( self.__class__.__name__ ) )

    def _DelUser( self ):
        raise NotImplementedError( '%s.User error: the User property cannot be deleted.' % ( self.__class__.__name__ ) )

    ###########################
    # Class Properties        #
    ###########################

    Database = property( _GetDatabase, _SetDatabase, _DelDatabase, _GetDatabase.__doc__ )
    Password = property( _GetPassword, _SetPassword, _DelPassword, _GetPassword.__doc__ )
    Server = property( _GetServer, _SetServer, _DelServer, _GetServer.__doc__ )
    User = property( _GetUser, _SetUser, _DelUser, _GetUser.__doc__ )

    ###########################
    # Object Constructor      #
    ###########################

    def __init__( self, **kwargs ):
        """Object constructor.

Recognized keywords are:
   database ... The name of the database the connection will be made to.
   password ... The password to use for the database connection.
   server ..... The server (name or IP-address) that the connection will be made to.
   user ....... The user to use for the database connection.

Raises NotImplementedError if an attempt is made to instantiate the class.
Raises TypeError if any keyword-value is of an invalid type.
Raises ValueError if any keyword-value is invalid.
Raises RuntimeError if any other exception is raised by the creation process.
"""
        if self.__class__ == BaseDatabaseConnection:
            raise NotImplementedError( 'BaseDatabaseConnection is nominally an abstract class, and cannot be instantiated' )
        # Set default values for attributes:
        self._connection = None
        self._database = None
        self._password = None
        self._server = None
        self._user = None
        kwargKeys = kwargs.keys()
        # Set database value, if one was supplied
        if 'database' in kwargKeys:
            try:
                self._SetDatabase( kwargs[ 'database' ] )
            except TypeError, error:
                raise TypeError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
            except ValueError, error:
                raise ValueError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
            except Exception, error:
                raise RuntimeError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
        # Set password value, if one was supplied
        if 'password' in kwargKeys:
            try:
                self._SetPassword( kwargs[ 'password' ] )
            except TypeError, error:
                raise TypeError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
            except ValueError, error:
                raise ValueError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
            except Exception, error:
                raise RuntimeError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
        # Set server value, if one was supplied
        if 'server' in kwargKeys:
            try:
                self._SetServer( kwargs[ 'server' ] )
            except TypeError, error:
                raise TypeError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
            except ValueError, error:
                raise ValueError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
            except Exception, error:
                raise RuntimeError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
        # Set user value, if one was supplied
        if 'user' in kwargKeys:
            try:
                self._SetUser( kwargs[ 'user' ] )
            except TypeError, error:
                raise TypeError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
            except ValueError, error:
                raise ValueError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )
            except Exception, error:
                raise RuntimeError( '%s Error: Could not create an instance of %s: %s', % ( self.__class__.__name__, self.__class__.__name__, error ) )

    ###########################
    # Object Destructor       #
    ###########################

    ###########################
    # Class Methods           #
    ###########################

    def Call( self, procedureName, *args ):
        """Abstract method - Calls a stored procedure on the connected database and returns the results, if any.

Returns a list of lists of dictionary objects, where each dictionary represents a single row in the results, and each list of dictionaries represents a single result-set from the results."""
        raise NotImplementedError( '%s.Call Error: Call has not been overridden in from it\'s definition in BaseDatabaseConnection' % ( self.__class__.__name__ ) )

    def Execute( self, sqlString ):
        """Abstract method - Executes a query against the connected database and returns the results, if any.

Returns a list of lists of dictionary objects, where each dictionary represents a single row in the results, and each list of dictionaries represents a single result-set from the results."""
        raise NotImplementedError( '%s.Execute Error: Execute has not been overridden in from it\'s definition in BaseDatabaseConnection' % ( self.__class__.__name__ ) )

    ###########################
    # Static Class Methods    #
    ###########################

    pass
__all__ += [ 'BaseDatabaseConnection' ]

Commentary

Most of this, I would hope, is fairly straightforward - the interface provided is fairly simple, with only five public property and two public method members that need to be dealt with. Of those, four of the property-members are concrete, and are simple string-value storage items, used to generate a connection to the back-end database. It is worth noting, though, that the Server and Database properties can only be set once. This restriction is in place to prevent accidental runtime changes for a database connection that would change which database the connection is pointed at - the risk, if those could be changed arbitrarily at runtime, is that the connection might read from one database and write to another, making the virtual filesystem's data invalid.

The Connection property is abstract, and so needs to be implemented on each derived class. Unlike the concrete properties, there is no associated internal storage attribute defined. The design/intention behind the structure of Connection is that derived classes would lazily instantiate a connection when the property is requested, using the other (concrete) properties to create the applicable connection object. I also expect that once a connection is instantiated behind the Connection property, the logic of the property-getter will be written in such a way as to check for an existing and usable connection before creating a new one. On the surface, that probably sounds like just another way of saying it's lazily instantiated, but there is a potential significant (and subtle) difference. My expectation is that a connection object can be created and used, but may be left in a state where it cannot be re-used. Assuming that there is a way to detect the "usability" of the underlying connection object, if it's still usable, it can just be returned and reused. If it's not reusable for whatever reason, a new one would be created and returned instead.

I also toyed with the idea of adding a Queue property, that would allow queries that don't need to be executed until object-destruction occurs to be batched together for one run against the database. For this project, that feels like it might be overkill, so I'm leaving it out of the mix, at least for the time being.

One of the reasons that I'm setting the Queue aside for now is that the Call and Execute methods are intended to actually run the back-end database-calls/queries when they are called. If the Queue property were implemented, those methods would potentially have to return something - the SQL to be executed, or some sort of function/argument reference-set - in order to be queue-able. More complexity than feels necessary at this time.

Wednesday, August 10, 2011

My Coding Style

I don't know if my Python coding style is "Pythonic" enough for some, and there may be conventions that I use that readers have issues with or questions about, so I'll document some of my conventions here, and update them if/as needed.

My code-template

First: I like to have things organized in a specific way. When I write a class, the structure of the code starts like so:

# Comment(s) for the file, may or may not include 
# source-control macros, but should include the file-name, at the very 
# least: 

# Namespace.Path (file-name.py)
"""Doc-string providing a summary of the package/module"""

# Define __all__ list, so that we can use "from MODULE/PACKAGE import *" 
# syntax elsewhere.
__all__ = []

########################################################################
# Required imports                                                     #
########################################################################

# import os, sys
# from MODULE import ITEMS

########################################################################
# Package constants                                                    #
########################################################################

# Any package- or module-level constants that need to be defined.
# Constants that need to be accessed outside the scope of the file they 
# are defined in should have an __all__ entry as well:

# Summary of constant
CONSTANTNAME = 'CONSTANT_VALUE'
__all__ += [ 'CONSTANTNAME' ]

########################################################################
# Custom Exceptions                                                    #
########################################################################

class CustomException( Exception ):
    """Custom exception to be raised when SOME_CONDITION_OCCURS."""
    pass
__all__ += [ 'CustomException' ]

########################################################################
# Definitions of (nominal) interfaces and abstract classes             #
########################################################################

class IInterfaceName( object ):
    """Nominal interface, provides functional requirements for objects that CAN_DO_WHATEVER"""

    ###########################
    # Class Attributes        #
    ###########################

    ###########################
    # Class Property Getters  #
    ###########################

    ###########################
    # Class Property Setters  #
    ###########################

    ###########################
    # Class Property Deleters #
    ###########################

    ###########################
    # Class Properties        #
    ###########################

    ###########################
    # Object Constructor      #
    ###########################

    ###########################
    # Object Destructor       #
    ###########################

    ###########################
    # Class Methods           #
    ###########################

    ###########################
    # Static Class Methods    #
    ###########################

    pass
__all__ += [ 'IInterfaceName' ]

class BaseClassName( object ):
    """Nominal abstract class, provides baseline functionality and interface requirements for objects that CAN_DO_WHATEVER"""

    ###########################
    # Class Attributes        #
    ###########################

    ###########################
    # Class Property Getters  #
    ###########################

    ###########################
    # Class Property Setters  #
    ###########################

    ###########################
    # Class Property Deleters #
    ###########################

    ###########################
    # Class Properties        #
    ###########################

    ###########################
    # Object Constructor      #
    ###########################

    ###########################
    # Object Destructor       #
    ###########################

    ###########################
    # Class Methods           #
    ###########################

    ###########################
    # Static Class Methods    #
    ###########################

    pass
__all__ += [ 'BaseClassName' ]

########################################################################
# Instantiable classes                                                 #
########################################################################

class ClassName( object ):
    """CLASS_SUMMARY"""

    ###########################
    # Class Attributes        #
    ###########################

    ###########################
    # Class Property Getters  #
    ###########################

    ###########################
    # Class Property Setters  #
    ###########################

    ###########################
    # Class Property Deleters #
    ###########################

    ###########################
    # Class Properties        #
    ###########################

    ###########################
    # Object Constructor      #
    ###########################

    ###########################
    # Object Destructor       #
    ###########################

    ###########################
    # Class Methods           #
    ###########################

    ###########################
    # Static Class Methods    #
    ###########################

    pass
__all__ += [ 'ClassName' ]

I generally try to keep class attributes, property-getter/-setter and -deleter functions, properties, and class methods alphabetized within their commented sections, if only because I haven't really learned how to use Git yet, and alphabetizing them allows changes to be more easily understood in other source-control systems.

Use of "nominal" interfaces and abstract classes

I know that Python provides a package that allows for the implementation of interfaces and abstract classes (the abc package), but I'm unconvinced that it's the way to go. Since the base Python language-structure doesn't really have a formal interface or abstract-class mechanism, and I discovered the abc package late, I'd more or less gotten into the habit of just writing classes that behaved somewhat like an interface or abstract class would:
  • Instantiation of a nominal interface or abstract class would raise an error (I typically use a NotImplementedError);
  • Nominally-abstract methods would be defined to raise an error (again, a NotImplementedError) if they were called from a derived-class instance that did not override them;
  • Concrete methods in a nominal abstract class would have actual implementation, and would rarely (never, in my experience) require an active check to see if they were being called on an instance of the abstract class, since it couldn't be instantiated.
Ultimately, the abc package does much of the same sort of thing, though in a different manner (and one that I'll freely admit I don't understand the inner workings of). The primary difference between the two is that my approach doesn't require that a derived class have all of the required methods during development, which I find to be less troublesome.

At the same time, my approach does require more unit-testing, potentially, but since most of the unit-testing that I've found necessary using my approach are cases that I would also be testing using the the abc package's capabilities, it feels like a wash to me.

    Property Getters, Setters and Deleters, and Properties

    There is a standard (presumably "Pythonic") way of using @property decorators to define object properties - see docs.python.org/.../functions.html#property. My experience with this decorator structure has caused minor but annoying issues in my code in the past, when I have needed to implement additional decoration on a getter/setter function. As a result, I tend to prefer breaking out the property-getter/-setter/-deleter functions separately (which I'd do in either case), and explicitly set the property using the built-in property function, rather than decorating the individual functions.

    Documentation decorators

    Though I haven't used them here (because I'm going to blog about them later), I have a set of documentation-decorators that I use with some frequency that will eventually allow generation of HTML or LaTeX (and thus PDF) API-documentation.

    DBFSpy Structure

    So, the first step in the DBFSpy project is probably to think out the design and object-structure, with a few specific goals in mind:
    • Initially, I'm only concerned with using a MySQL back-end, so that's going to be the primary focus for the database side of things.
    • It will use the current version of Fuse for Python available on my Ubuntu installations (0.2 or so);
    The DBFSpy package

    This should provide any global constants, definitions of interfaces and the like, and anything else that the various database-specific modules would require.

    There are a couple of custom exceptions that are already defined (since they are very basic): ConnectionChangeError and FilsystemChangeError. They will serve to provide specific exception-throwing capabilities in cases where the filesystem or the database underlying the filesystem are being changed. This, in my mind, would be something to avoid, since the underlying filesystem data could then be left in a corrupted or unusable state.

    There are also a couple of constants defined here:

    GUIDCHECKER
    A regular expression that will be used elsewhere to determine whether a string is a GUID/UUID-formatted value (which I'll be using extensively to provide unique identifiers in the filesystem database); and
    LOGFORMAT
    A string that defines the format for an output line in the system's log-file, if any is generated.

    With no other implementation details, then, the package-header will look something like this:

    # DBFSpy package header (__init__.py)
    """Provides global constants, interfaces and functionality for the database-specific DBFSpy implementations in various member modules."""
    
    # Define __all__ list, so that we can use "from DBFSpy import *" 
    # syntax elsewhere.
    __all__ = []
    
    ########################################################################
    # Required imports                                                     #
    ########################################################################
    
    import fuse, logging, os, re, stat, types, uuid
    
    ########################################################################
    # Package constants                                                    #
    ########################################################################
    
    # A regular expression that we can use to verify GUID/UUID structure.
    GUIDCHECKER = re.compile( '[0-9a-fA-F]{,8}-[0-9a-fA-F]{,4}-[0-9a-fA-F]{,4}-[0-9a-fA-F]{,4}-[0-9a-fA-F]{,12}' )
    __all__ += [ 'GUIDCHECKER' ]
    
    # logging output format:
    LOGFORMAT = '%(name)s:%(asctime)s:%(levelname)s:%(message)s'
    __all__ += [ 'LOGFORMAT' ]
    
    ########################################################################
    # Custom Exceptions                                                    #
    ########################################################################
    
    class ConnectionChangeError( Exception ):
        """Custom exception to be raised when a change is made to connection properties of an IDatabaseConnected instance that would break or corrupt the virtual filesystem if the change were allowed."""
        pass
    __all__ += [ 'ConnectionChangeError' ]
    
    class FilesystemChangeError( Exception ):
        """Custom exception to be raised when a change is made to an IFilesystem instance that would break or corrupt the virtual filesystem if the change were allowed."""
        pass
    __all__ += [ 'FilesystemChangeError' ]
    
    ########################################################################
    # Definitions of (nominal) interfaces and abstract classes             #
    ########################################################################
    
    class BaseDatabaseConnection( object ):
        """Nominal abstract class, provides baseline functionality and interface requirements for objects that can connect to and execute queries, procedures, and/or functions against a back-end data-source"""
    
        ###########################
        # Class Attributes        #
        ###########################
    
        ###########################
        # Class Property Getters  #
        ###########################
    
        ###########################
        # Class Property Setters  #
        ###########################
    
        ###########################
        # Class Property Deleters #
        ###########################
    
        ###########################
        # Class Properties        #
        ###########################
    
        ###########################
        # Object Constructor      #
        ###########################
    
        ###########################
        # Object Destructor       #
        ###########################
    
        ###########################
        # Class Methods           #
        ###########################
    
        ###########################
        # Static Class Methods    #
        ###########################
    
        pass
    __all__ += [ 'BaseDatabaseConnection' ]
    
    class BaseDataObject( object ):
        """Nominal abstract class, provides baseline functionality and interface requirements for objects whose state data is persisted in a back-end data-store."""
        pass
    __all__ += [ 'BaseDataObject' ]
    
    class IFilesystemItem( object ):
        """Nominal interface, provides functional requirements for objects that can represent items in the virtual filesystem."""
        pass
    __all__ += [ 'IFilesystemItem' ]
    
    class BaseFilesystemItemCache( object ):
        """Nominal abstract class, provides functional requirements for objects that can cache virtual filesystem items."""
        pass
    __all__ += [ 'BaseFilesystemItemCache' ]
    
    class IFilesystemOperators( object ):
        """Nominal interface, provides functional requirements for objects that can respond to filesystem commands."""
        pass
    __all__ += [ 'IFilesystemOperators' ]
    
    class IFilesystem( object ):
        """Nominal interface, provides functional requirements for objects that provide the logic-bridge between the virtual filesystemand the back-end data-store."""
        pass
    __all__ += [ 'IFilesystem' ]
    
    ########################################################################
    # Instantiable classes                                                 #
    ########################################################################
    
    # No instantiable classes are anticipated at this level of the package...
    

    The DBFSpy.MySQL module

    The main DBFSpy package-header is not expected to provide any instantiable classes, just nominal interface and abstract-class definitions. The database-specific implementations will be broken out into separate modules (one for each back-end database-connection type), in order to allow for as much separation of code as we can manage across the different back-end implementations. Since I'm planning on starting with a MySQL back-end, that will be the first concrete-implementation module. Others should be similarly structured.

    # DBFSpy.MySQL module (MySQL.py)
    """Provides instantiable classes that can be used to create a virtual 
    filesystem that uses a MySQL back-end to store filesystem information."""
    
    # Define __all__ list, so that we can use "from DBFSpy import *" 
    # syntax elsewhere.
    __all__ = []
    
    ########################################################################
    # Required imports                                                     #
    ########################################################################
    
    import fuse, MySQLdb
    
    from DBFSpy import GUIDCHECKER, LOGFORMAT, ConnectionChangeError, FilesystemChangeError, BaseDatabaseConnection, BaseDataObject, IFilesystemItem, IFilesystemOperators, IFilesystem 
    
    ########################################################################
    # Package constants                                                    #
    ########################################################################
    
    ########################################################################
    # Custom Exceptions                                                    #
    ########################################################################
    
    ########################################################################
    # Definitions of (nominal) interfaces and abstract classes             #
    ########################################################################
    
    ########################################################################
    # Instantiable classes                                                 #
    ########################################################################
    
    class MySQLFileItem( BaseDataObject, IFileSystemItem, object ):
        """Represents an item in a virtual filesystem, whose state-information is persisted in a MySQL back-end data-store."""
        pass
    __all__ += [ 'MySQLFileItem' ]
    
    class MySQLDatasource( BaseDatabaseConnection, BaseFilesystemItemCache, IFilesystemOperators, object ):
        """Represents a connection to a MySQL back-end data-store, with an in-memory cache of filesystem objects, and actual implementations of filesystem operations to return item-data from the cached or live-data state of filesystem objects."""
        pass
    __all__ += [ 'MySQLDatasource' ]
    
    class MySQLFilesystem( IFilesystem, IFilesystemOperators, fuse.Fuse, object ):
        """Provides the virtual filesystem, and provides filesystem operations (which are delegated to a MySQLDatasource instance)."""
        pass
    __all__ += [ 'MySQLFilesystem' ]
    
    

    Other database modules

    Ultimately, as the project grows, there should be other database-specific modules added to provide support for other back-end database systems (PostgreSQL and ODBC connections, at the very least, possibly others as well). They should provide a very similar set of concrete classes.

    The main executable

    Though the main intended use for the final codebase is intended to support a CMS system that I'm thinking through (PicaCMS), there is no reason that it couldn't be used for other purposes as well. With my intended goal in mind, however, the typical use-case would involve the main CMS starting one or more instances of the database filesystem, probably as a service on the machine(s) it is installed on. To that end, the invocation to start the filesystem would likely look something like this:

    dbfspy -m mountpoint -t database-type -s server -d database -u user -p password -l log-file -g logging-level

    where:
    mountpoint
    is the "real" filesystem mount-point for the virtual filesystem;
    database-type;
    is the type of back-end database (e.g., MySQL, PostgreSQL, ODBC, etc.) that the filesystem's data will be stored in;
    server
    is the machine (by name or IP-address) that the virtual filesystem database resides on;
    database
    is the name of the database that the virtual filesystem's data is stored in;
    user
    is the user-name that will be used to connect to the specified database;
    password
    is the password that will be used to connect to the specified database;
    log-file
    is the file-path to the virtual filesystem's log-file (if any); and
    logging-level
    is the logging-level for the system.

    There is also a strong possibility that separate executables for each database-type would be of some use (e.g., a "mysqlfspy," "pgsqlfspy," and "odbcfspy"). If I decide to develop those as well, I would expect them to use the same command-line arguments as the base dbfspy command shown above, without the database-type specifier.

    That should cover all of the expected code, at least until/unless something else comes to mind.

    I'll cover each class-definition in it's own post, but since there may be a lot of interrelationship, particularly early on, these links resolve to all items tagged as related to the class:

    Thursday, August 4, 2011

    Content Management: The Good, the Bad, and the Ugly

    A good portion of the last several years of my career has been focused around web content management using SDL Tridion. I have to admit that I like Tridion, overall, though there are a number of things I'd change if it were up to me. At the same time, Tridion is a very high-end web CMS - it's expensive, and the learning-curve for it is very steep, to say the least.

    I've looked around at other (free and commercial) content-management solutions, and while many of them have a lot to offer, there weren't any that I really liked that I could afford for my own uses, so I added building my own web CMS tool to my list of projects, figuring I'd use or adapt concepts and practices that I liked, discard the ones that I thought were of little or no use, or that felt problematic, and see what I could come up with.

    The list of CMS' that I looked over included:
    • BrightSpot (a proprietary CMS that I encountered at work)
    • Django (a Python application)
    • Drupal
    • Joomla
    • SiteCore
    • Tridion
    • Wordpress
    There are a whole slew of web CMS applications out in the wild, though - too many to evaluate all of them and still have time to actually do anything fun or useful on my own.

    Filesystem in User space (FUSE) and CMS

    One of the challenges that a serious, enterprise-level CMS will inevitably encounter is the need to publish content in a way that allows for the best possible performance on the final web page.

    For "pure" content - that is, markup that is ready to be sent to the browser as-is, right out of the CMS - a well-optimized databse-retrieval system is probably quite sufficient. But for content that doesn't fall neatly into that category, other approaches need to be available. Consider that any (most?) websites these days would benefit from having publishable items that can then be run in an application space: things that require database or service interaction that cannot easily or effectively be extracted from that same database.

    But most web-application languages provide *some* sort of inclusion mechanism: ASP.NET has its control mechanisms (though they are a bit more than "just" an inclusion), PHP has include and require functions, ColdFusion has CFINCLUDE, and JSP has some similar mechanism. Variations aside, these are just ways of telling the application server to include and execute some external file resource. In general, in order to execute such an item, it has to be something that can be read through the file-system. It cannot be text (or whatever) residing in a database. At best, it has to be read from that database, then explicitly executed - a process that can have security and/or performance implications.

    In many cases, this can force separate content/executable storage and display processes and mechanisms - something that would be better avoided if at all possible.

    It would certainly be possible to wire thing together so that each different content-item being published by a CMS would somehow "know" at publish-time where its content was coming from. But it would be better if there was one mechanism that would handle both simple text-retrieval (usually handled by calling to a database) and executable item-retrieval (file-system inclusion). There is one that is fairly obvious - writing every content-item out to its own file, and including each applicable item-file on a page-by-page basis. It works, but it can lead to synchronization issues if the CMS is generating content for many servers, or even on a single server if the underlying file(s) get locked.

    So, ideally, a CMS needs the single-source capabilities of a database-retrieval process, and the automatic executability of a file-system inclusion. Not, typically, a mix supported by most CMS', and for reasons that would be obvious after some tinkering.

    But Linux/Python offers a solution: a filesystem-in-user-space (FUSE) based "filesystem" attached to a database. In that scenario, each row in a given table or view essentially has a "file" associated with it automatically, as the row/file is created or altered, and that would vanish when its row is deleted. If the database driving the virtual file-system is peppy and/or has replication capabilities, the synchronization of items is as simple as an insert or update in the database that the virtual file-system is attached to, or replicating the database as needed to machines that other virtual filesystem instances are pointed at.

    On the development side of the equation, this sort of set-up would also allow "normal" versioning systems (cvs, subversion, git, etc.) to work with database-managed content/files. A development-side file-system (that also points at a database) would look like a normal, native filesystem as far as the version-control system was concerned, so checkouts, branching/merging, and all the other version-control goodness that comes with a good version-control system would be available.

    Some Fuse/Python resources: