So, the last generic-but-concrete class that I was working on, you may remember,
was the Query
class. It was stalled for a time because I wasn't sure
how I really wanted the process of executing a query against a database
to work. My primary concern was which object was going to be responsible for what
part of the result-acquisition.
In retrospect, this feels to me now like the entire question is tied to making a decision about where the line should be drawn for the logical encapsulation-points for the relevant classes, and to some extent, it's also partly a question of how tightly coupled the classes themselves are. Actually, there are several principles that I'm trying to find a good balance between for this part of the object-stack, so I'll list them out, and examine different scenarios against them:
- Encapsulate what varies
- "When designing software, look for the portions most likely to change and prepare them for future expansion by shielding the rest of the program from that change. Hide the potential variation behind an interface. Then, when the implementation changes, software written to the interface doesn't need to change. This is called encapsulating variation." (from blogs.msdn.com/b/steverowe). There may be other methods than hiding things behind an interface, but that's likely the most common (and best?) practice.
- Program to an interface, not to an implementation
- Any two objects that conform to the same interface should be functionally interchangeable, so that other objects that care only about that interface then don't have to know or care about the implementation underneath it.
- The Single Responsibility Principle
- Any given object should have a single responsibility, that responsibility should be entirely encapsulated within the class, and it's properties and methods should be narrowly aligned with that responsibility.
Scenario 1: Query
handles everything (almost)
The first scenario centers around the concept that the Query
is
responsible for the storage of the SQL to be executed, the execution of that SQL
against the data-source, and the handling and storage of results returned from
that execution. In this scenario, the Query
must have (or be provided)
a data-source object (derived from BaseDataConnector
) which must
have a method to execute an arbitrary SQL string against the database it represents,
and that returns results to the Query
. In keeping with the idea that
the Query
holds as much relevant responsibility as possible, those
results would probably not be in ResultSet
/Record
instances.
- Encapsulating what varies
- Since the most likely point of variation is the specific mechanism that
executes a SQL string against a data-source, and what structure/format
those results come back in, encapsulating that in the
BaseDataConnector
-derived data-source feels pretty good. - Programming to an interface, not to an implementation
- This scnario also feels pretty good in this respect, since
Query
doesn't need to know (or care) what kind ofBaseDataConnector
it's data-source is, just that it is aBaseDataConnector
. - The Single Responsibility Principle
- Assuming that the
BaseDataConnector
returns the simplest possible structure/format of it's results (most likely a list of dictionary-instances), and thatQuery
is responsible for taking that raw result-set and converting them intoResultSet
/Record
structures, the responsibilities for each class might be stated something like so:BaseDataConnector
- Manages internal and/or low-level connection to and interaction with a back-end data-source.
Query
- Provides user/developer-level mechanisms to interact with a back-end database.
- Other Considerations
- I'm pretty sure that I noted, some few posts ago, that for performance
reasons it would be desirable to be able to queue up queries in order to
minimize the number of connections made against the back-end database.
This approach doesn't prevent that sort of capability, but I
suspect that it would be more difficult to implement. It would almost
certainly require a change to the responsibility of the
BaseDataConnector
-derived classes, if only to provide some mechanism to be able to associate an arbitary number ofResultSet
instances with a particularQuery
object, but since creation of aResultSet
and it's associatedRecord
instances to indicate the beginning of a query-association can be done in code, not as part of the "real" data-access, this may not be too onerous.
# Create the data-source, where DataConnector is derived from BaseDataConnector dataSource = DataConnector( host='host_name', database='database_name', user='user_name', password='password' ) # Create the Query instance, with SQL that returns two result-sets sql = """SELECT somefield FROM sometable ORDER BY somefield; SELECT someotherfield FROM sometable ORDER BY someotherfield;""" myQuery = Query( dataSource, sql ) someResults, someOtherResults = myQuery.ResultsOn reflection, the idea of queueing up multiple queries is, perhaps, not as critical as I'd thought. Consider that this example returns multiple result-sets, and is managed in a single
Query
instance. The performance concerns about
multiple database-connections could be mitigated easily enough by taking this
sort of approach, provided that the number of result-sets returned by any given
chunk of SQL were known and didn't change based on input parameters. This approach
is feeling better and better to me the more I think on it.
Scenario 2: BaseDataConnector
handles everything (almost)
This scenario's concept is diametrically opposed from the first: Here, the
BaseDataConnector
-derived objects handle as much of the interaction
as possible, while Query
becomes little more than a dumb data-object,
containing the SQL to be executed, if it exists at all.
- Encapsulating what varies
- The encapsulation of implementation-variations is not significantly different here from the first scenario - it's still centered around the potential differences between different database-engines, how their connection and query-execution works, and what the results-structure looks like.
- Programming to an interface, not to an implementation
- Again, not significantly different from the first scenario - the specifics
of what the interface (at the
BaseDataConnector
level) provides change, but we're still keeping the implementation hidden from the rest of the world. - The Single Responsibility Principle
- This scenario does make drastic changes to the
responsibilities of the objects, though. Fundamentally, there's no real
need for any
Query
objects at all, since the actual SQL could just be passed to aQuery
method in a data-source object, and the results retrieved directly from there. If we assume that there's no need to have aQuery
object at all, the responsibilities break down to:BaseDataConnector
- Manages all interaction with a back-end data-aource, including execution of queries against it.
- Other Considerations
- This scenario, assuming that there is no
Query
object in the mix at all, pretty much prohibits any sort of queueing of queries. If we assume that a single SQL execution can be unpacked to a number of result-set variables in a manner similar to the example code for the first scenario, maybe this consideration is moot. The only thing we lose in this scenario is the ability to have state-data associated with the SQL of the query.
# Create the data-source, where DataConnector is derived from BaseDataConnector dataSource = DataConnector( host='host_name', database='database_name', user='user_name', password='password' ) # execute a query against the data-source sql = """SELECT somefield FROM sometable ORDER BY somefield; SELECT someotherfield FROM sometable ORDER BY someotherfield;""" someResults, someOtherResults = dataSource.Query( sql )
I'm going to stop with these two scenarios, at least for now, because the desire for state-data/SQL association is a point woth more detailed consideration. When I originally conceived the DataConnectors module, it was, admittedly, with a specific eye towards a structure more along the lines of the structure that I'd gotten accustomed to as a CFML programmer. That included some query- or result-set-specific properties being available, including:
- Data-source type (through a data-source association);
- Authentication and authorization (also through a data-source association, but easily converted into a host/database/user/password structure);
- Timeout control (though I don't know that I ever used this functionality);
- Caching control;
- Association of results to variables within the code;
- Execution-time information; and
- Paging of results;
BaseDataConnector
. They
don't require any sort of state-data at a Query
-object level at all.
Timeout control, if it's even something that can be usefully implemented,
doesn't feel like it would require any local state-data.
Caching control might, though, depending on how that caching were implemented. If it's implemented outside the database (say, by picking a result-set and writing it to a file), the query would potentially need to be able to keep track of how long the cached data-sets were valid. Since data-source-side caching may or may not be available for any given database-engine, that's a point of variation that should be encapsulated somewhere (though where is still an open question).
So long as we assume that the unpacking of results (as shown in the previous code-examples) will work, the association of results to variables is moot.
Execution-time information would require state-data somewhere, and
the Query
object seems like the only place that makes sense. It
can't be usefully associated as a ResultSet
level, since the number
of result-sets from a given query will vary - particularly if multiple result-sets
are expected with any frequency.
Paging of results is probably something that makes more sense at the
ResultSet
level, so it can be safely disregarded for the purpose of
discussion of BaseDataConnector
and Query
responsibilities.
With all of this in mind, my decision is to make the Query
object
as smart as possible, and the BaseDataConnector
-derived objects
pretty simple (though not completely dumb). I believe that this will afford the
best development experience in the long run, and it keeps the possibility of
query-state-dependent capabilities open on a longer-term basis, should there be
a need for them somewhere down the line. With all of that in mind, then, the
IsDataConnector
interface loses it's Query
method in
favor of an Execute
method, with changes made accordingly to it's
unit-tests:
@DocumentArgument( 'argument', 'self', None, SelfDocumentationString ) @DocumentArgument( 'argument', 'query', None, 'The IsQuery object that will be executed against the object\'s Connection.' ) def Execute( self, query ): """Executes the provided IsQuery object's query against the object's connection, returning one or more results.""" raise NotImplementedError( '%s.Query is not implemented as required by IsDataConnector.' % ( self.__class__.__name__ ) )
def testExecute( self ): """Unit-tests the Execute method of the IsDataConnector interface.""" testObject = IsDataConnectorDerived() try: testObject.Execute( None ) self.fail( 'Execute should raise a NotImplementedError.' ) except NotImplementedError: pass except Exception, error: self.fail( 'Execute should raise a NotImplementedError, but %s was raised instead:\n %s' % ( error.__class__.__mame__, error ) )
The BaseDataConnector
inherits that abstract Execute
method, so it's unit-tests need to be updated accordingly:
def testExecute( self ): """Unit-tests the Execute method of the BaseDataConnector abstract class.""" self.assertEquals( BaseDataConnector.Execute, IsDataConnector.Execute, 'BaseDataConnector should not define the Execute method.' )
...and now Query
can be defined. There are some commented-out
properties and methods in this code that I've left in place for later, mostly
centering around caching of query results in the future.
Query (Nominal final class):
class Query( IsQuery, object ): """Represents a query (and it's results) against a back-end data-source.""" ################################## # Class Attributes # ################################## ################################## # Class Property-Getter Methods # ################################## # def _GetCacheExpiresAt( self ): # """Gets the time at which the query's cached results will expire.""" # return self._cacheExpiresAt def _GetDatasource( self ): """Gets the Datasource used to execute the query's SQL.""" return self._datasource def _GetResults( self ): """Gets the results from the execution of the query's SQL against the data-source.""" if not self._results: if self._sql: self.__Execute() return self._results def _GetSql( self ): """Gets the SQL to be executed against the data-source to generate the results.""" return self._sql ################################## # Class Property-Setter Methods # ################################## # def _SetCacheExpiresAt( self, value ): # """Sets the time at which the query's cached results will expire.""" # self.self._cacheExpiresAt = value def _SetDatasource( self, value ): """Sets the Datasource used to execute the query's SQL.""" if not isinstance( value, IsDataConnector ): raise TypeError( '%s.Datasource expects an instance of IsDataConnector.' % ( self.__class__.__name__ ) ) if self._results != None: raise AttributeError( '%s.Datasource cannot be reset once results have been retrieved.' % ( self.__class__.__name__ ) ) self._datasource = value def _SetResults( self, value ): """Sets the results from the execution of the query's SQL against the data-source.""" self._results = value def _SetSql( self, value ): """Sets the SQL to be executed against the data-source to generate the results.""" if type( value ) != types.StringType: raise TypeError( '%s.Sql expects a non-empty string value.' % ( self.__class__.__name__ ) ) if value == '': raise ValueError( '%s.Sql expects a non-empty string value.' % ( self.__class__.__name__ ) ) if self._results != None: raise AttributeError( '%s.Sql cannot be reset once results have been retrieved.' % ( self.__class__.__name__ ) ) self._sql = value ################################## # Class Property-Deleter Methods # ################################## # def _DelCacheExpiresAt( self ): # """Deletes the time at which the query's cached results will expire - uncaches the results in the process.""" # self._cacheExpiresAt = 0 def _DelDatasource( self ): """Deletes the Datasource used to execute the query's SQL.""" self._datasource = None def _DelResults( self ): """Deletes the results from the execution of the query's SQL against the data-source.""" self._results = None def _DelSql( self ): """Deletes the SQL to be executed against the data-source to generate the results.""" self._sql = None ################################## # Class Properties # ################################## # CacheExpiresAt = property( _GetCacheExpiresAt, _SetCacheExpiresAt, None, IsQuery.CacheExpiresAt.__doc__ ) Datasource = property( _GetDatasource, _SetDatasource, None, IsQuery.Datasource.__doc__ ) Results = property( _GetResults, None, None, IsQuery.Results.__doc__ ) Sql = property( _GetSql, _SetSql, None, IsQuery.Sql.__doc__ ) ################################## # Object Constructor # ################################## @DocumentArgument( 'argument', 'self', None, SelfDocumentationString ) @DocumentArgument( 'argument', 'datasource', None, 'The Datasource used to execute the query\'s SQL.' ) @DocumentArgument( 'argument', 'sql', None, 'The SQL to be executed against the data-source to generate the results.' ) # @DocumentArgument( 'keyword', 'parameters', 'cacheFor', '(Non-negative integer, optional) The number of seconds that the Query\'s results should be cached for.' ) # @DocumentArgument( 'keyword', 'parameters', 'cacheUntil', '(Time, optional) A time that the Query\'s results will be cached until.' ) # def __init__( self, datasource, sql, **parameters ): def __init__( self, datasource, sql ): """Object constructor.""" # Nominally final: Don't allow any class other than this one if self.__class__ != Query: raise NotImplementedError( 'Query is (nominally) a final class, and is not intended to be derived from.' ) # Set default values: self._DelDatasource() self._DelResults() self._DelSql() # Set properties from arguments self._SetDatasource( datasource ) self._SetSql( sql ) ################################## # Object Destructor # ################################## ################################## # Class Methods # ################################## # def Cache( self ): # """Caches the results of the Query in a manner yet to be determined.""" # raise NotImplementedError( 'Query.Cache has not been implemented yet.' ) def __Execute( self ): """Executes the query's SQL against it's specified Datasource, retrieves the results, and builds out a list of ResultSet objects in the Query's Results.""" self._results = TypedList( [ ResultSet ] ) rawResults = self.Datasource.Execute( self ) for theResult in rawResults: self._results.append( ResultSet( theResult ) ) # def Uncache( self ): # """Uncaches the results of the Query.""" # raise NotImplementedError( 'Query.Uncache has not been implemented yet.' ) __all__ += [ 'Query' ]
Given my typical patterns, and assuming that you've been following along for
any length of time, there's not a whole lot out of the ordinary here. The
Datasource
and Sql
properties are the exception, in that
they allow modification until the query's results are retrieved.
Similarly, with the exception of the FixedResultsConnector
class,
the unit-tests are pretty typical, though again, they are explicitly testing the
mutability of the Datasource
and Sql
properties before
and after results are acquired. The Results
property tests may feel
a bit on the sparse side also, but bear in mind that all we care about at this
level is that we get back a TypedList
of ResultSet
instances - ResultSet
is already tested, so that's all we need to
know...
class QueryDerived( Query ): def __init__( self, datasource, sql ): Query.__init__( self, datasource, sql ) class FixedResultsConnector( BaseDataConnector ): def __init__( self, **parameters ): BaseDataConnector.__init__( self, **parameters ) def Execute( self, query ): return [ # First result-set [ {'results1row':1, 'results1field1':'results1row1field1', 'results1field2':'results1row1field2' }, {'results1row':2, 'results1field1':'results1row2field1', 'results1field2':'results1row2field2' }, ], # Second result-set [ {'results2row':1, 'results2field1':'results2row1field1'}, ], ] def ResultsCount( self ): return length( self.Execute( self, '' ) ) def ResultsFields( self ): results = self.Execute( self, '' ) keySets = [] for theResult in results: keysets.append( theResult[0].keys() ) return keySets class testQuery( unittest.TestCase ): """Unit-tests the Query class.""" def setUp( self ): pass def tearDown( self ): pass def testFinal( self ): """Testing final nature of the Query class.""" testDataSource = FixedResultsConnector() testSql = 'SELECT * FROM sometable;' try: testObject = QueryDerived( testDataSource, testSql ) self.fail( 'Query is nominally a final class, and should not be instantiable.' ) except NotImplementedError: pass except Exception, error: self.fail( 'Attempting to instantiate an object derived from Query should raise NotImplemetedError, but %s was raised instead:\n %s' % ( error.__class__.__name__, error ) ) testObject = Query( testDataSource, testSql ) self.assertTrue( isinstance( testObject, Query ), 'Creating a Query instance should result in a Query instance.' ) def testConstruction( self ): """Testing construction of the Query class.""" pass def testPropertyCountAndTests( self ): """Testing the properties of the Query class.""" items = getMemberNames( Query )[0] actual = len( items ) expected = 3 self.assertEquals( expected, actual, 'Query is expected to have %d properties to test, but %d were dicovered by inspection.' % ( expected, actual ) ) for item in items: self.assertTrue( HasTestFor( self, item ), 'There should be a test for the %s property (test%s), but none was identifiable.' % ( item, item ) ) def testMethodCountAndTests( self ): """Testing the methods of the Query class.""" items = getMemberNames( Query )[1] actual = len( items ) expected = 0 self.assertEquals( expected, actual, 'Query is expected to have %d methods to test, but %d were dicovered by inspection.' % ( expected, actual ) ) for item in items: self.assertTrue( HasTestFor( self, item ), 'There should be a test for the %s method (test%s), but none was identifiable.' % ( item, item ) ) # Test properties def testDatasource( self ): """Tests the Datasource property of the Query class.""" testDataSource = FixedResultsConnector() testOtherDataSource = FixedResultsConnector() testSql = 'SELECT * FROM sometable;' testObject = Query( testDataSource, testSql ) self.assertEquals( testObject.Datasource, testDataSource, 'The Datasource property should be set and gettable after creation.' ) testObject.Datasource = testOtherDataSource self.assertEquals( testObject.Datasource, testOtherDataSource, 'The Datasource property should be settabe and gettable after creation if results haven\'t been retrieved.' ) testResults = testObject.Results try: testObject.Datasource = testDataSource self.fail( 'A Query\'s Datasource property should not be modifiable after results have been retrieved' ) except AttributeError: pass except Exception, error: self.fail( 'Attempting to change a Query\'s Datasource should raise AttributeError, but %s was raised instead:\n %s' % ( error.__class__.__name__, error ) ) def testResults( self ): """Tests the Results property of the Query class.""" testDataSource = FixedResultsConnector() testSql = 'SELECT * FROM sometable;' testObject = Query( testDataSource, testSql ) queryResults1, queryResults2 = testObject.Results resultSet1, resultSet2 = testDataSource.Execute( '' ) self.assertEquals( resultSet1[0].keys(), queryResults1.FieldNames ) self.assertEquals( len( resultSet1 ), len( queryResults1 ) ) self.assertEquals( resultSet2[0].keys(), queryResults2.FieldNames ) self.assertEquals( len( resultSet2 ), len( queryResults2 ) ) def testSql( self ): """Tests the Sql property of the Query class.""" testDataSource = FixedResultsConnector() testSql = 'SELECT * FROM sometable;' testOtherSql = 'SELECT * FROM someothertable;' testObject = Query( testDataSource, testSql ) self.assertEquals( testObject.Sql, testSql, 'The Sql property should be set and gettable after creation.' ) testObject.Sql = testOtherSql self.assertEquals( testObject.Sql, testOtherSql, 'The Sql property should be settabe and gettable after creation if results haven\'t been retrieved.' ) testResults = testObject.Results try: testObject.Sql = testSql self.fail( 'A Query\'s Sql property should not be modifiable after results have been retrieved' ) except AttributeError: pass except Exception, error: self.fail( 'Attempting to change a Query\'s Sql should raise AttributeError, but %s was raised instead:\n %s' % ( error.__class__.__name__, error ) ) testSuite.addTests( unittest.TestLoader().loadTestsFromTestCase( testQuery ) )
No comments:
Post a Comment