Monday, December 19, 2011

The DataConnectors module (part 6)

So, the last generic-but-concrete class that I was working on, you may remember, was the Query class. It was stalled for a time because I wasn't sure how I really wanted the process of executing a query against a database to work. My primary concern was which object was going to be responsible for what part of the result-acquisition.

In retrospect, this feels to me now like the entire question is tied to making a decision about where the line should be drawn for the logical encapsulation-points for the relevant classes, and to some extent, it's also partly a question of how tightly coupled the classes themselves are. Actually, there are several principles that I'm trying to find a good balance between for this part of the object-stack, so I'll list them out, and examine different scenarios against them:

Encapsulate what varies
"When designing software, look for the portions most likely to change and prepare them for future expansion by shielding the rest of the program from that change. Hide the potential variation behind an interface. Then, when the implementation changes, software written to the interface doesn't need to change. This is called encapsulating variation." (from blogs.msdn.com/b/steverowe). There may be other methods than hiding things behind an interface, but that's likely the most common (and best?) practice.
Program to an interface, not to an implementation
Any two objects that conform to the same interface should be functionally interchangeable, so that other objects that care only about that interface then don't have to know or care about the implementation underneath it.
The Single Responsibility Principle
Any given object should have a single responsibility, that responsibility should be entirely encapsulated within the class, and it's properties and methods should be narrowly aligned with that responsibility.
I've listed these in the order of priority that I think makes the most sense for this particular problem, my argument being that of all of these, the Single Responsibility Principle is the most flexible and I'm all but certain that encapsulation of what varies the least flexible. Programming to an interface rather than to an implementation will, I think, be implicit with any reasonable design for the problem, but should be kept in mind as a higher priority than the class' responsibility.

Scenario 1: Query handles everything (almost)

The first scenario centers around the concept that the Query is responsible for the storage of the SQL to be executed, the execution of that SQL against the data-source, and the handling and storage of results returned from that execution. In this scenario, the Query must have (or be provided) a data-source object (derived from BaseDataConnector) which must have a method to execute an arbitrary SQL string against the database it represents, and that returns results to the Query. In keeping with the idea that the Query holds as much relevant responsibility as possible, those results would probably not be in ResultSet/Record instances.

Encapsulating what varies
Since the most likely point of variation is the specific mechanism that executes a SQL string against a data-source, and what structure/format those results come back in, encapsulating that in the BaseDataConnector-derived data-source feels pretty good.
Programming to an interface, not to an implementation
This scnario also feels pretty good in this respect, since Query doesn't need to know (or care) what kind of BaseDataConnector it's data-source is, just that it is a BaseDataConnector.
The Single Responsibility Principle
Assuming that the BaseDataConnector returns the simplest possible structure/format of it's results (most likely a list of dictionary-instances), and that Query is responsible for taking that raw result-set and converting them into ResultSet/Record structures, the responsibilities for each class might be stated something like so:
BaseDataConnector
Manages internal and/or low-level connection to and interaction with a back-end data-source.
Query
Provides user/developer-level mechanisms to interact with a back-end database.
Other Considerations
I'm pretty sure that I noted, some few posts ago, that for performance reasons it would be desirable to be able to queue up queries in order to minimize the number of connections made against the back-end database. This approach doesn't prevent that sort of capability, but I suspect that it would be more difficult to implement. It would almost certainly require a change to the responsibility of the BaseDataConnector-derived classes, if only to provide some mechanism to be able to associate an arbitary number of ResultSet instances with a particular Query object, but since creation of a ResultSet and it's associated Record instances to indicate the beginning of a query-association can be done in code, not as part of the "real" data-access, this may not be too onerous.
An example of code using this structure would look something like this:
# Create the data-source, where DataConnector is derived from BaseDataConnector
dataSource = DataConnector( host='host_name', database='database_name', user='user_name', password='password' )
# Create the Query instance, with SQL that returns two result-sets
sql = """SELECT somefield 
    FROM sometable 
    ORDER BY somefield;
SELECT someotherfield
    FROM sometable
    ORDER BY someotherfield;"""
myQuery = Query( dataSource, sql )
someResults, someOtherResults = myQuery.Results
On reflection, the idea of queueing up multiple queries is, perhaps, not as critical as I'd thought. Consider that this example returns multiple result-sets, and is managed in a single Query instance. The performance concerns about multiple database-connections could be mitigated easily enough by taking this sort of approach, provided that the number of result-sets returned by any given chunk of SQL were known and didn't change based on input parameters. This approach is feeling better and better to me the more I think on it.

Scenario 2: BaseDataConnector handles everything (almost)

This scenario's concept is diametrically opposed from the first: Here, the BaseDataConnector-derived objects handle as much of the interaction as possible, while Query becomes little more than a dumb data-object, containing the SQL to be executed, if it exists at all.

Encapsulating what varies
The encapsulation of implementation-variations is not significantly different here from the first scenario - it's still centered around the potential differences between different database-engines, how their connection and query-execution works, and what the results-structure looks like.
Programming to an interface, not to an implementation
Again, not significantly different from the first scenario - the specifics of what the interface (at the BaseDataConnector level) provides change, but we're still keeping the implementation hidden from the rest of the world.
The Single Responsibility Principle
This scenario does make drastic changes to the responsibilities of the objects, though. Fundamentally, there's no real need for any Query objects at all, since the actual SQL could just be passed to a Query method in a data-source object, and the results retrieved directly from there. If we assume that there's no need to have a Query object at all, the responsibilities break down to:
BaseDataConnector
Manages all interaction with a back-end data-aource, including execution of queries against it.
Other Considerations
This scenario, assuming that there is no Query object in the mix at all, pretty much prohibits any sort of queueing of queries. If we assume that a single SQL execution can be unpacked to a number of result-set variables in a manner similar to the example code for the first scenario, maybe this consideration is moot. The only thing we lose in this scenario is the ability to have state-data associated with the SQL of the query.
An example of code using this structure would look something like this:
# Create the data-source, where DataConnector is derived from BaseDataConnector
dataSource = DataConnector( host='host_name', database='database_name', user='user_name', password='password' )
# execute a query against the data-source
sql = """SELECT somefield 
    FROM sometable 
    ORDER BY somefield;
SELECT someotherfield
    FROM sometable
    ORDER BY someotherfield;"""
someResults, someOtherResults = dataSource.Query( sql )

I'm going to stop with these two scenarios, at least for now, because the desire for state-data/SQL association is a point woth more detailed consideration. When I originally conceived the DataConnectors module, it was, admittedly, with a specific eye towards a structure more along the lines of the structure that I'd gotten accustomed to as a CFML programmer. That included some query- or result-set-specific properties being available, including:

  1. Data-source type (through a data-source association);
  2. Authentication and authorization (also through a data-source association, but easily converted into a host/database/user/password structure);
  3. Timeout control (though I don't know that I ever used this functionality);
  4. Caching control;
  5. Association of results to variables within the code;
  6. Execution-time information; and
  7. Paging of results;
Of these, the first two can be converted into object-constructor arguments easily enough, by passing the authentication/authorization values as simple values, and the data-source type as a class derived from BaseDataConnector. They don't require any sort of state-data at a Query-object level at all. Timeout control, if it's even something that can be usefully implemented, doesn't feel like it would require any local state-data.

Caching control might, though, depending on how that caching were implemented. If it's implemented outside the database (say, by picking a result-set and writing it to a file), the query would potentially need to be able to keep track of how long the cached data-sets were valid. Since data-source-side caching may or may not be available for any given database-engine, that's a point of variation that should be encapsulated somewhere (though where is still an open question).

So long as we assume that the unpacking of results (as shown in the previous code-examples) will work, the association of results to variables is moot.

Execution-time information would require state-data somewhere, and the Query object seems like the only place that makes sense. It can't be usefully associated as a ResultSet level, since the number of result-sets from a given query will vary - particularly if multiple result-sets are expected with any frequency.

Paging of results is probably something that makes more sense at the ResultSet level, so it can be safely disregarded for the purpose of discussion of BaseDataConnector and Query responsibilities.

With all of this in mind, my decision is to make the Query object as smart as possible, and the BaseDataConnector-derived objects pretty simple (though not completely dumb). I believe that this will afford the best development experience in the long run, and it keeps the possibility of query-state-dependent capabilities open on a longer-term basis, should there be a need for them somewhere down the line. With all of that in mind, then, the IsDataConnector interface loses it's Query method in favor of an Execute method, with changes made accordingly to it's unit-tests:

    @DocumentArgument( 'argument', 'self', None, SelfDocumentationString )
    @DocumentArgument( 'argument', 'query', None, 'The IsQuery object that will be executed against the object\'s Connection.' )
    def Execute( self, query ):
        """Executes the provided IsQuery object's query against the object's connection, returning one or more results."""
        raise NotImplementedError( '%s.Query is not implemented as required by IsDataConnector.' % ( self.__class__.__name__ ) )
        def testExecute( self ):
            """Unit-tests the Execute method of the IsDataConnector interface."""
            testObject = IsDataConnectorDerived()
            try:
                testObject.Execute( None )
                self.fail( 'Execute should raise a NotImplementedError.' )
            except NotImplementedError:
                pass
            except Exception, error:
                self.fail( 'Execute should raise a NotImplementedError, but %s was raised instead:\n  %s' % ( error.__class__.__mame__, error ) )

The BaseDataConnector inherits that abstract Execute method, so it's unit-tests need to be updated accordingly:

        def testExecute( self ):
            """Unit-tests the Execute method of the BaseDataConnector abstract class."""
            self.assertEquals( BaseDataConnector.Execute, IsDataConnector.Execute, 'BaseDataConnector should not define the Execute method.' )

...and now Query can be defined. There are some commented-out properties and methods in this code that I've left in place for later, mostly centering around caching of query results in the future.

Query (Nominal final class):

class Query( IsQuery, object ):
    """Represents a query (and it's results) against a back-end data-source."""

    ##################################
    # Class Attributes               #
    ##################################

    ##################################
    # Class Property-Getter Methods  #
    ##################################

#    def _GetCacheExpiresAt( self ):
#        """Gets the time at which the query's cached results will expire."""
#        return self._cacheExpiresAt

    def _GetDatasource( self ):
        """Gets the Datasource used to execute the query's SQL."""
        return self._datasource

    def _GetResults( self ):
        """Gets the results from the execution of the query's SQL against the data-source."""
        if not self._results:
            if self._sql:
                self.__Execute()
        return self._results

    def _GetSql( self ):
        """Gets the SQL to be executed against the data-source to generate the results."""
        return self._sql

    ##################################
    # Class Property-Setter Methods  #
    ##################################

#    def _SetCacheExpiresAt( self, value ):
#        """Sets the time at which the query's cached results will expire."""
#        self.self._cacheExpiresAt = value

    def _SetDatasource( self, value ):
        """Sets the Datasource used to execute the query's SQL."""
        if not isinstance( value, IsDataConnector ):
            raise TypeError( '%s.Datasource expects an instance of IsDataConnector.' % ( self.__class__.__name__ ) )
        if self._results != None:
            raise AttributeError( '%s.Datasource cannot be reset once results have been retrieved.' % ( self.__class__.__name__ ) )
        self._datasource = value

    def _SetResults( self, value ):
        """Sets the results from the execution of the query's SQL against the data-source."""
        self._results = value

    def _SetSql( self, value ):
        """Sets the SQL to be executed against the data-source to generate the results."""
        if type( value ) != types.StringType:
            raise TypeError( '%s.Sql expects a non-empty string value.' % ( self.__class__.__name__ ) )
        if value == '':
            raise ValueError( '%s.Sql expects a non-empty string value.' % ( self.__class__.__name__ ) )
        if self._results != None:
            raise AttributeError( '%s.Sql cannot be reset once results have been retrieved.' % ( self.__class__.__name__ ) )
        self._sql = value

    ##################################
    # Class Property-Deleter Methods #
    ##################################

#    def _DelCacheExpiresAt( self ):
#        """Deletes the time at which the query's cached results will expire - uncaches the results in the process."""
#        self._cacheExpiresAt = 0

    def _DelDatasource( self ):
        """Deletes the Datasource used to execute the query's SQL."""
        self._datasource = None

    def _DelResults( self ):
        """Deletes the results from the execution of the query's SQL against the data-source."""
        self._results = None

    def _DelSql( self ):
        """Deletes the SQL to be executed against the data-source to generate the results."""
        self._sql = None

    ##################################
    # Class Properties               #
    ##################################

#    CacheExpiresAt = property( _GetCacheExpiresAt, _SetCacheExpiresAt, None, IsQuery.CacheExpiresAt.__doc__ )
    Datasource = property( _GetDatasource, _SetDatasource, None, IsQuery.Datasource.__doc__ )
    Results = property( _GetResults, None, None, IsQuery.Results.__doc__ )
    Sql = property( _GetSql, _SetSql, None, IsQuery.Sql.__doc__ )

    ##################################
    # Object Constructor             #
    ##################################

    @DocumentArgument( 'argument', 'self', None, SelfDocumentationString )
    @DocumentArgument( 'argument', 'datasource', None, 'The Datasource used to execute the query\'s SQL.' )
    @DocumentArgument( 'argument', 'sql', None, 'The SQL to be executed against the data-source to generate the results.' )
#    @DocumentArgument( 'keyword', 'parameters', 'cacheFor', '(Non-negative integer, optional) The number of seconds that the Query\'s results should be cached for.' )
#    @DocumentArgument( 'keyword', 'parameters', 'cacheUntil', '(Time, optional) A time that the Query\'s results will be cached until.' )
#    def __init__( self, datasource, sql, **parameters ):
    def __init__( self, datasource, sql ):
        """Object constructor."""
        # Nominally final: Don't allow any class other than this one
        if self.__class__ != Query:
            raise NotImplementedError( 'Query is (nominally) a final class, and is not intended to be derived from.' )
        # Set default values:
        self._DelDatasource()
        self._DelResults()
        self._DelSql()
        # Set properties from arguments
        self._SetDatasource( datasource )
        self._SetSql( sql )

    ##################################
    # Object Destructor              #
    ##################################

    ##################################
    # Class Methods                  #
    ##################################

#    def Cache( self ):
#        """Caches the results of the Query in a manner yet to be determined."""
#        raise NotImplementedError( 'Query.Cache has not been implemented yet.' )

    def __Execute( self ):
        """Executes the query's SQL against it's specified Datasource, retrieves the results, and builds out a list of ResultSet objects in the Query's Results."""
        self._results = TypedList( [ ResultSet ] )
        rawResults = self.Datasource.Execute( self )
        for theResult in rawResults:
            self._results.append( ResultSet( theResult ) )

#    def Uncache( self ):
#        """Uncaches the results of the Query."""
#        raise NotImplementedError( 'Query.Uncache has not been implemented yet.' )

__all__ += [ 'Query' ]

Given my typical patterns, and assuming that you've been following along for any length of time, there's not a whole lot out of the ordinary here. The Datasource and Sql properties are the exception, in that they allow modification until the query's results are retrieved.

Similarly, with the exception of the FixedResultsConnector class, the unit-tests are pretty typical, though again, they are explicitly testing the mutability of the Datasource and Sql properties before and after results are acquired. The Results property tests may feel a bit on the sparse side also, but bear in mind that all we care about at this level is that we get back a TypedList of ResultSet instances - ResultSet is already tested, so that's all we need to know...

    class QueryDerived( Query ):
        def __init__( self, datasource, sql ):
            Query.__init__( self, datasource, sql )
    
    class FixedResultsConnector( BaseDataConnector ):
        def __init__( self, **parameters ):
            BaseDataConnector.__init__( self, **parameters )
        def Execute( self, query ):
            return [
                    # First result-set
                    [
                        {'results1row':1, 'results1field1':'results1row1field1', 'results1field2':'results1row1field2' },
                        {'results1row':2, 'results1field1':'results1row2field1', 'results1field2':'results1row2field2' },
                    ],
                    # Second result-set
                    [
                        {'results2row':1, 'results2field1':'results2row1field1'},
                    ],
                ]
        def ResultsCount( self ):
            return length( self.Execute( self, '' ) )
        def ResultsFields( self ):
            results = self.Execute( self, '' )
            keySets = []
            for theResult in results:
                keysets.append( theResult[0].keys() )
            return keySets
            

    class testQuery( unittest.TestCase ):
        """Unit-tests the Query class."""
    
        def setUp( self ):
            pass
    
        def tearDown( self ):
            pass
        
        def testFinal( self ):
            """Testing final nature of the Query class."""
            testDataSource = FixedResultsConnector()
            testSql = 'SELECT * FROM sometable;'
            try:
                testObject = QueryDerived( testDataSource, testSql )
                self.fail( 'Query is nominally a final class, and should not be instantiable.' )
            except NotImplementedError:
                pass
            except Exception, error:
                self.fail( 'Attempting to instantiate an object derived from Query should raise NotImplemetedError, but %s was raised instead:\n  %s' % ( error.__class__.__name__, error ) )
            testObject = Query( testDataSource, testSql )
            self.assertTrue( isinstance( testObject, Query ), 'Creating a Query instance should result in a Query instance.' )
    
        def testConstruction( self ):
            """Testing construction of the Query class."""
            pass
    
        def testPropertyCountAndTests( self ):
            """Testing the properties of the Query class."""
            items = getMemberNames( Query )[0]
            actual = len( items )
            expected = 3
            self.assertEquals( expected, actual, 'Query is expected to have %d properties to test, but %d were dicovered by inspection.' % ( expected, actual ) )
            for item in items:
                self.assertTrue( HasTestFor( self, item ), 'There should be a test for the %s property (test%s), but none was identifiable.' % ( item, item ) )

        def testMethodCountAndTests( self ):
            """Testing the methods of the Query class."""
            items = getMemberNames( Query )[1]
            actual = len( items )
            expected = 0
            self.assertEquals( expected, actual, 'Query is expected to have %d methods to test, but %d were dicovered by inspection.' % ( expected, actual ) )
            for item in items:
                self.assertTrue( HasTestFor( self, item ), 'There should be a test for the %s method (test%s), but none was identifiable.' % ( item, item ) )

        # Test properties

        def testDatasource( self ):
            """Tests the Datasource property of the Query class."""
            testDataSource = FixedResultsConnector()
            testOtherDataSource = FixedResultsConnector()
            testSql = 'SELECT * FROM sometable;'
            testObject = Query( testDataSource, testSql )
            self.assertEquals( testObject.Datasource, testDataSource, 'The Datasource property should be set and gettable after creation.' )
            testObject.Datasource = testOtherDataSource
            self.assertEquals( testObject.Datasource, testOtherDataSource, 'The Datasource property should be settabe and gettable after creation if results haven\'t been retrieved.' )
            testResults = testObject.Results
            try:
                testObject.Datasource = testDataSource
                self.fail( 'A Query\'s Datasource property should not be modifiable after results have been retrieved' )
            except AttributeError:
                pass
            except Exception, error:
                self.fail( 'Attempting to change a Query\'s Datasource should raise AttributeError, but %s was raised instead:\n  %s' % ( error.__class__.__name__, error ) )

        def testResults( self ):
            """Tests the Results property of the Query class."""
            testDataSource = FixedResultsConnector()
            testSql = 'SELECT * FROM sometable;'
            testObject = Query( testDataSource, testSql )
            queryResults1, queryResults2 = testObject.Results
            resultSet1, resultSet2 = testDataSource.Execute( '' )
            self.assertEquals( resultSet1[0].keys(), queryResults1.FieldNames )
            self.assertEquals( len( resultSet1 ), len( queryResults1 ) )
            self.assertEquals( resultSet2[0].keys(), queryResults2.FieldNames )
            self.assertEquals( len( resultSet2 ), len( queryResults2 ) )

        def testSql( self ):
            """Tests the Sql property of the Query class."""
            testDataSource = FixedResultsConnector()
            testSql = 'SELECT * FROM sometable;'
            testOtherSql = 'SELECT * FROM someothertable;'
            testObject = Query( testDataSource, testSql )
            self.assertEquals( testObject.Sql, testSql, 'The Sql property should be set and gettable after creation.' )
            testObject.Sql = testOtherSql
            self.assertEquals( testObject.Sql, testOtherSql, 'The Sql property should be settabe and gettable after creation if results haven\'t been retrieved.' )
            testResults = testObject.Results
            try:
                testObject.Sql = testSql
                self.fail( 'A Query\'s Sql property should not be modifiable after results have been retrieved' )
            except AttributeError:
                pass
            except Exception, error:
                self.fail( 'Attempting to change a Query\'s Sql should raise AttributeError, but %s was raised instead:\n  %s' % ( error.__class__.__name__, error ) )

    testSuite.addTests( unittest.TestLoader().loadTestsFromTestCase( testQuery ) )

No comments:

Post a Comment