My Brain on Python: The DataConnectors module (part 4)

So, before I start digging into the code for the concrete classes of DataConnectors, I want to do a quick review of how they are all supposed to fit together:

An instance of BaseDataConnector (which implements IsDataConnector) will provide the basic connectivity to the back-end data-store, as well as the mechanism(s) for executing queries against that data-store;
Queries are objects implementing IsQuery, that store the SQL to be executed, as well as the results of that execution once it's run in one or more ResultSet objects;
The ResultSet instances (which implement IsResultSet, and are TypedLists keep track of the fields returned, row-by-row.

There's a piece missing from the most recent class-diagram: how the records are going to be represented... We need to add a Record class to handle that, like so:

Ideally, what I'd prefer from a code-structure standpoint, is to be able to iterate over a ResultSet instance, and use object-/dot-notation for the fields. For example, something like:

for record in results:
    print record.FirstName, record.LastName

Providing that capability is easy, since Python's objects can have dynamically-assigned attributes and values. The downside to taking that very basic approach, though is that those records, and the fields within them, are mutable in the code, and for a database-record that's read-only, that's not a good thing. There is a way to work around that, though, and I'll demonstrate that a bit later.

Another thing that hasn't really been thrashed out yet is when and where query-execution occurs. I've been contemplating two basic approaches, and trying to weight between their merits and drawbacks:

Execution happens at the Query:

In this model, the Query uses it's Datasource to connect to the back-end database, executes the SQL, and generates the result-set.

Advantages

It feels to me like query-objects preserve the single responsibility principle better this way - really, at least in theory, all they need is a connection-object, supplied by the data-connector, and away they go.
I suspect that this will allow for easier implementation of "one-off" queries - I don't know how many times in my career a project has crossed my desk where the work needed essentially boiled down to "run another query, and add the results to the display." I believe that optimizing the data-connector approach for this sort of one-off case wouldn't be difficult, but I'm not completely sure about that at this point.
It feels like this approach would lend itself well (or at least better) to lazy instantiation of query-results: that is, create as many query-objects as even might be necessary, but don't actually get the results from the database until (or even unless) they are specifically requested. At least for basic fetch-and-display processes, this could be very advantageous.

Drawbacks

There is a strong potential, when (not if) this module is expanded to include back-end databases other than MySQL, that query-objects may need to be customized to each individual database-engine. I don't have a strong feeling at this point for just how likely this really is, but since the logical point of encapsulation for any database-engine-specific capabilities is at the data-connector level, not at the query level, this feels like it would be less than optimal the first time that this kind of database-engine variation happens. It's not future-proof, and could easily require a lot of new work or rework, and I feel that's a major concern.
If the assumption that one-off queries would be easier to implement in this model holds true, that could tend to reinforce that sort of design/solution in applications. On the surface, this doesn't sound all that bad, but if each one-off query represents some (hopefully small) amount of database-connection time, the more this sort of design is propagated, the more performance could suffer as a result. I'm not a big fan of making things easier in the shorter term at the expense of longer-term stability or maintainability.
It doesn't feel like this approach would facilitate building up a list of queries, then executing all of them at once - at least not as easily.

Execution happens at the BaseDataConnector:

In this model, the connector (a derivative of BaseDataConnector) connects to the datasource, then runs the query or queries specified, handing the results back to the Query object(s) that supplied the SQL in the first place so that they can keep track of them.

Advantages

The major advantage is almost certainly that keeping the actual execution of the query encapsulated in the data-connector (which is the place where it might vary) will protect us from the potential need to have to spin off a Query-variant for each and every data-connector-type that we might eventually want. As I noted above, I don't have a strong feeling for how likely this really is, so more research on that topic is probably in order. It should be noted, however, that even when that research is complete, and even if it proves that the connector-types that we're planning aren't going to be affected, that does not mean that they won't change, or that some new, hot-topic database-engine won't come out that would require distinct Query classes...
A major advantage to this approach is, I think, that queries could be queued up and deferred until the developer knows that the code will need them.

Drawbacks

There might be significant design changes needed for BaseDataObject-derived objects in order to take advantage of query-queueing capabilities. Or it may not even be feasible to reconcile those objects' data-needs with a queueable query-stack, in which case a mixed model might be necessary (which is more up-front work).

So, it feels like more research is needed, and possibly some rework on the designs of BaseDataConnector, and maybe some minor changes to Query will be needed as well. In the interests of moving this post along (while I do my research in the background), I'm not going to work on Query just yet. That class, and any changes to BaseDataConnector will be the topic of the next post, and I'll attack the two remaining general-purpose concrete classes left: Record and ResultsSet.

Instances of the Record class represent single records/rows returned from a query executed against the database. They are the core member-type for a ResultSet instance (which in turn represent the collections of records from a query). Record-objects should (for the time being) be immutable once instantiated: we don't want to allow changes to a record's data inside an application without having some way to re-synchronize that data with the original record, which is why the BaseDataObject abstract class exists. For similar reasons, ResultSet instances should also be immutable once instantiated. Both are variations of a "dumb" data-object concept (though ResultSet is not completely dumb, since it has all of the functionality of a TypedList): objects whose sole purpose is to represent some data-structure.

Since I'm expecting a Record instance to be generated from a dictionary data-structure, where the dictionary's keys are the column-names and the values are the field-values, the first challenge to overcome is to find a way to pass a dictionary structure into an object at creation time, without having to go through the typical property2=value2, property2=value2 structure that we might expect given the code I've shown so far. These names/values will vary from one record-structure to another, and having to code that structure into a record's constructor would mean having to generate a Record-derived class for every data-type that needs to be represented. That, obviously, is not sustainable. Fortunately, Python provides a mechanism to pass a dictionary as a keyword-argument list. We've already seen the recieving-end structure (see BaseDataConnector.__init__ for an example), though I don't see that I've provided an example of the "sending" structure yet. To illustrate that, here's a quick little script:

class DictProperties( object ):
    """A test-class, created to see how dictionary arguments can be 
pushed into an object's properties."""
    def __init__( self, **properties ):
        """Object constructor."""
        for theKey in properties:
            self.__setattr__( theKey, properties[ theKey ] )

properties = {
    'Prop1':1,
    'Prop2':'ook',
    'Prop3':True,
    }

testObject = DictProperties( **properties )

print 'testObject.Prop1 ... %s' % ( testObject.Prop1 )
print 'testObject.Prop2 ... %s' % ( testObject.Prop2 )
print 'testObject.Prop3 ... %s' % ( testObject.Prop3 )

When this script is run, it prints out:

testObject.Prop1 ... 1
testObject.Prop2 ... ook
testObject.Prop3 ... True

This is what we want: The elements of the dictionary (properties) are being set as "real" attributes in the object-instance. But how does it work? The secret is in lines 4, 6-7, and 15:

Line(s)
4: The **properties argument is a keyword-argument list. Internally, that is a dictionary that can be iterated over in line 6.
6-7: As the properties keyword/dictionary is being iterated over (line 6), we take advantage of a Python built-in: __setattr__. When present, __setattr__ is called when an attribute assignment is attempted, instead of simply storing the value in the object's instance dictionary. Remember this, because it's going to be a key part of the immutability of a Record later on...
15: This is the part that allows an arbitrary dictionary to be passed into the object's constructor (it would work with any other method as well). Instead of supplying Prop1=1, Prop2='ook', Prop3=True as arguments to the constructor, which would tie each instantiation to a known structure, we pass what is essentially a keyword-argument-reference to a dictionary (**properties) instead. The constructor recognizes it as a keyword-arguments structure (which it's expecting), and handles it just like it would if each keyword/value were supplied directly.

So an arbitrary data-structure can be passed to an object during creation. That's the first hurdle passed.

The next hurdle is to make those values immutable after they've been set. Getting that to work takes a bit of modification that's easier to explain by showing the code, so here's a modified copy of the same script above:

class DictProperties( object ):
    """A test-class, created to see how dictionary arguments can be 
pushed into an object's properties."""
    # Reserved attributes
    __objectIsLocked = False
    # Object constructor
    def __init__( self, **properties ):
        """Object constructor."""
        for theKey in properties:
            self.__setattr__( theKey, properties[ theKey ] )
        self.__objectIsLocked = True
    def __setattr__( self, name, value ):
        """Object-attribute setter."""
        if name == '__objectIsLocked' and self.__objectIsLocked:
            raise AttributeError( 'The object is locked, and cannot be unlocked' )
        if self.__objectIsLocked:
            raise AttributeError( 'The object has been locked, and the "%s" attribute cannot be set.' % ( name ) )
        return object.__setattr__( self, name, value )

properties = {
    'Prop1':1,
    'Prop2':'ook',
    'Prop3':True,
    }

testObject = DictProperties( **properties )

print 'testObject.Prop1 ... %s' % ( testObject.Prop1 )
print 'testObject.Prop2 ... %s' % ( testObject.Prop2 )
print 'testObject.Prop3 ... %s' % ( testObject.Prop3 )

try:
    testObject.Prop3 = False
except AttributeError:
    print 'testObject.Prop3 ... Reset failed as expected'
try:
    testObject.Ook = False
except AttributeError:
    print 'testObject.Ook ..... New-attribute set failed as expected'

When this is run, the results are:

testObject.Prop1 ... 1
testObject.Prop2 ... ook
testObject.Prop3 ... True
testObject.Prop3 ... Reset failed as expected
testObject.Ook ..... New-attribute set failed as expected

The magic that makes this work is in the overridce of __setattr__ (lines 12-18). Since __setattr__ is being overridden, the local method gets called instead of the built-in that exists in object. It checks to see if the private __objectIsLocked attribute is being set, and if it's already been set to True (which would lock the object's attributes, as we'll see shortly), and raises an error if this is the case. If the attribute being set is not __objectIsLocked, control passes to the next if on line 16, where a general check of __objectIsLocked is performed. If __objectIsLocked is True, then the object's attributes are locked, and an error is raised, otherwise control passes on, and the results of the base object's __setattr__, with the same arguments, is returned.

That's pretty much all we need, really, for Record:

Record (Nominal final class):

class Record( object ):
    """Represents an immutable record as retrieved from a query against a database."""

    ##################################
    # Class Attributes               #
    ##################################

    __fieldNames = []
    __recordIsLocked = False

    ##################################
    # Class Property-Getter Methods  #
    ##################################

    def _GetFieldNames( self ):
        """Gets the field-names of the Record object created at instantiation."""
        return self.__fieldNames

    ##################################
    # Class Property-Setter Methods  #
    ##################################

    ##################################
    # Class Property-Deleter Methods #
    ##################################

    ##################################
    # Class Properties               #
    ##################################

    FieldNames = property( _GetFieldNames, None, None, 'Gets the field-names of the Record object created at instantiation.' )

    ##################################
    # Object Constructor             #
    ##################################

    @DocumentArgument( 'argument', 'self', None, SelfDocumentationString )
    @DocumentArgument( 'keyword', 'properties', 'variable', 'Key/value pairs providing the record\'s field-names and values of those fields.' )
    def __init__( self, **properties ):
        """Object constructor."""
        # Nominally final: Don't allow any class other than this one
        if self.__class__ != Record:
            raise NotImplementedError( 'Record is (nominally) a final class, and is not intended to be derived from.' )
        for theKey in properties:
            self.__setattr__( theKey, properties[ theKey ] )
            if not theKey in self.__fieldNames:
                self.__fieldNames.append( theKey )
        self.__recordIsLocked = True

    ##################################
    # Object Destructor              #
    ##################################

    ##################################
    # Class Methods                  #
    ##################################

    @DocumentArgument( 'argument', 'self', None, SelfDocumentationString )
    @DocumentArgument( 'argument', 'name', None, '(String, required) The name of the attribute to be set.' )
    @DocumentArgument( 'argument', 'value', None, '(Any type, required) The value to be set in the named attribute.' )
    def __setattr__( self, name, value ):
        """Object-attribute setter."""
        if name == '_Record__recordIsLocked' and self.__recordIsLocked != False:
            raise AttributeError( 'The record is locked, and cannot be unlocked' )
        if self.__recordIsLocked:
            raise AttributeError( 'The record has been locked, and the "%s" attribute cannot be set.' % ( name ) )
        if hasattr( self, name ) and name != '_%s__recordIsLocked' % ( self.__class__.__name__ ):
            raise AttributeError( 'The "%s" attribute already exists, and cannot be reset.' % ( name ) )
        return object.__setattr__( self, name, value )

__all__ += [ 'Record' ]

Line(s)

8-9

Since Record is nominally a final class, and cannot be extended, the __keys and __recordIsLocked can be safely made private. They are also defined as class-level attributes so that they will already exist when __setattr__ is defined, helping to protect those attributes from modification if a Record object is created with "__keys" or "__recordIsLocked" fields.

15-17, 31

A typical, though read-only property for the record, providing the field-names of the record.

37-48

The object constructor:

42-43: My standard check to assure that classes extending Record, so long as they call Record.__init__, will raise an error.
44-47: Reads in the keyword arguments, and calls __setattr__ to set the supplied property names and values in the object.; One caveat about this format is that construction of a Record must use a syntax like aRecord = Record( **fields ). If a call like aRecord = Record( fields ) is attempted, a TypeError will be raised, even if the argument is a dictionary.
48: Locks the record, preventing addition or modification of properties in the object.

58-69

Overrides the default __setattr__ provided by the base object to prevent modification of the object's state-data once the object's been locked:

63-64

Allows the record to be locked, but not unlocked, by specifically checking for the private attribute __recordIsLocked, and allowing it to be changed only if it's False.

This could be changed, if extension of Record was needed, by changing the check to look something like this:

if name == '_%s__recordIsLocked' % ( self.__class__.__name__ ) and self.__recordIsLocked != False:
    # do the rest

65-66

Checks to see whether the object is locked, regardless of the attribute-modification being attempted, and raises an error if it is.

67-68

Checks specifically for attempts to re-set an existing attribute. I don't think this would ever come into play, since trying to duplicate key-names in either a dictionary or a keyword-argument list will raise errors, but since I can't categorically rule the possibility out, I checked for it anyway. I cannot think of a way to test it, though...

Unit-tests

    class testRecord( unittest.TestCase ):
        """Unit-tests the Record class."""
    
        def setUp( self ):
            pass
    
        def tearDown( self ):
            pass
        
        def testFinal( self ):
            """Testing final nature of the Record class."""
            try:
                testObject = RecordDerived()
                self.fail( 'Record is nominally a final class, and should not be extendable.' )
            except NotImplementedError:
                pass
            except Exception, error:
                self.fail( 'Instantiating an object derived from Record should raise NotImplementedError, by %s was raised instead:\n  %s' % ( error.__class__.__name__, error ) )
    
        def testConstruction( self ):
            """Testing construction of the Record class."""
            testObject = Record()
            self.assertTrue( isinstance( testObject, Record ), 'Instances of Record should be instances of Record... Duh!' )
            testProperties = { 'Property1':1, 'Property2':'ook', 'Property3':True }
            testObject = Record( **testProperties )
            for property in testProperties:
                testValue = eval( 'testObject.%s' % property )
                self.assertEquals( testValue, testProperties[ property ], '' )
    
        def testPropertyCountAndTests( self ):
            """Testing the properties of the Record class."""
            items = getMemberNames( Record )[0]
            actual = len( items )
            expected = 1
            self.assertEquals( expected, actual, 'Record is expected to have %d properties to test, but %d were dicovered by inspection.' % ( expected, actual ) )
            for item in items:
                self.assertTrue( HasTestFor( self, item ), 'There should be a test for the %s property (test%s), but none was identifiable.' % ( item, item ) )

        def testMethodCountAndTests( self ):
            """Testing the methods of the Record class."""
            items = getMemberNames( Record )[1]
            actual = len( items )
            expected = 0
            self.assertEquals( expected, actual, 'Record is expected to have %d methods to test, but %d were dicovered by inspection.' % ( expected, actual ) )
            for item in items:
                self.assertTrue( HasTestFor( self, item ), 'There should be a test for the %s method (test%s), but none was identifiable.' % ( item, item ) )

        # Test properties

        def testFieldNames( self ):
            """Unit-tests the FieldNames property of the Record class."""
            testProperties = { 'Property1':1, 'Property2':'ook', 'Property3':True }
            testObject = Record( **testProperties )
            self.assertEquals( testObject.FieldNames, testProperties.keys() )
            try:
                testObject.Keys = 'ook'
                self.fail( 'A Record\'s FieldNames cannot be altered.' )
            except AttributeError:
                pass
            except Exception, error:
                self.fail( 'Changes to a record\'s FieldNames should raise AttributeError, bu %s was raised instead:\n  %s ' % ( error.__class__.__name__, error ) )
                

        def testImmutability( self ):
            """Unit-tests the immutability of records."""
            testProperties = { 'Property1':1, 'Property2':'ook', 'Property3':True }
            testObject = Record( **testProperties )
            try:
                testObject.NewProperty = 'ook'
                self.fail( 'Once created, a record should not allow the creation of a new attribute/field.' )
            except AttributeError:
                pass
            except Exception, error:
                self.fail( 'Changes to a record\'s data should raise AttributeError, bu %s was raised instead:\n  %s ' % ( error.__class__.__name__, error ) )
            try:
                testObject.Property1 = 'ook'
                self.fail( 'Once created, a record should not allow the alteration of an existing attribute/field.' )
            except AttributeError:
                pass
            except Exception, error:
                self.fail( 'Changes to a record\'s data should raise AttributeError, bu %s was raised instead:\n  %s ' % ( error.__class__.__name__, error ) )

    testSuite.addTests( unittest.TestLoader().loadTestsFromTestCase( testRecord ) )

Most of the unit-tests are pretty typical, so there's not a lot to comment on:

Line(s)

25

Note the object-construction argument: **testPropertiesm not testProperties. THis is an example of the syntax noted above for creation of a Record.

64-81

Tests the immutability of a Record:

68-74: Testing that creation of a new attribute on a Record instance isn't allowed.
75-81: Testing that alteration of an existing attribute is not allowed.

Though this is only one of the two concrete classes I said I was going to hit, this post is pretty long, so I'll pick up next time with ResultSet.

My Brain on Python

Wednesday, December 14, 2011

The DataConnectors module (part 4)

No comments:

Post a Comment

About Me

Followers