Unit testing for Python database applications

Problem

You are building an application that uses database in Python. For example, you might have created following function, which uses pyodbc to insert a list of rows into a database table. Each row is a dictionary.

def insert_rows(rows, table_name, dbc):
    field_names = rows[0].keys()
    field_names_str = ', '.join(field_names)
    placeholder_str = ','.join('?'*len(field_names))
    insert_sql = f'INSERT INTO {table_name}({field_names_str}) VALUES ({placeholder_str})'
    saved_autocommit = dbc.autocommit
    with dbc.cursor() as cursor:
        try:
            dbc.autocommit = False
            tuples = [ tuple((row[field_name] for field_name in field_names)) for row in rows ]
            cursor.executemany(insert_sql, tuples)
            cursor.commit()
        except Exception as exc:
            cursor.rollback()
            raise exc
        finally:
            dbc.autocommit = saved_autocommit

How I can unit test such a function?

Solution

Use unittest.mock to generate mock database connection. It is as simple as:

dbc = mock.MagicMock()

The very first test could be to verify that our function calls the cursor() method of the database connection.

import unittest
from unittest import mock


class Test_insert_rows(unittest.TestCase):

    def fix_dbc(self):
        dbc = mock.MagicMock(spec=['cursor'])
        dbc.autocommit = True
        return dbc

    def fix_rows(self):
        rows = [{'id':1, 'name':'John'}, 
                {'id':2, 'name':'Jane'},]
        return rows

    def test_insert_rows_calls_cursor_method(self):
        dbc = self.fix_dbc()
        rows = self.fix_rows()
        insert_rows(rows, 'users', dbc)
        self.assertTrue(dbc.cursor.called)

if __name__ == '__main__':
    unittest.main(argv=['', '-v'])

Some highlights on what I have done in this test:

The database connection, used for testing is created using a method. This is because I am gonna need this connection again and again in the test methods I am going to create.
The rows fixture is also created using a method. For the same reason.
At the end there is if __name__ == '__main__':... . This will run unittest if I execute the file as python script.

Here is the result of the execution:

(.venv37) sandbox>python test_insert_rows.py -v
test_insert_rows_calls_cursor_method (test_insert_rows.Test_insert_rows) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

The test I have created so far is not very exciting. To create a better test, let's look closely at what this function does:

generates insert statement
generates a list of tuples from the rows list
calls the executemany method of the database cursor
commits the transaction

So my test could verify that the function calls the executemany method with correct arguments and commits the transaction.

Let's implement this test.

    def fix_tuples(self):
        tuples = [(1,'John'),
                  (2,'Jane'),]
        return tuples

    def test_insert_rows_calls_executemany_and_commit_passing_correct_arguments(self):
        dbc = self.fix_dbc()
        rows = self.fix_rows()

        insert_rows(rows, 'users', dbc)

        with dbc.cursor() as cursor:
            expect_sql = 'INSERT INTO users(id, name) VALUES (?,?)'
            expect_tuples = self.fix_tuples()
            calls = [mock.call.executemany(expect_sql, expect_tuples),
                     mock.call.commit(),]
            cursor.assert_has_calls(calls)
        self.assertTrue(dbc.autocommit)

In this test I use the assert_has_calls assertion of the mock object to verify that specific calls has been made with expected arguments and in expected order.

At the end of the test I verify that autocommit property of the database connection is restored to True.

Ok. Great. So far we tested the happy path. What happens if something fails?

In the following test I am gonna make the database connection raise an exception to test the behavior of my function. We have to verify:

Transaction is rolled back after executemany is called.
Database connection autocommit property has been restored.
Exception is propagated.

    def test_insert_rows_rollsback_transaction_on_databse_exception(self):
        dbc = self.fix_dbc()
        rows = self.fix_rows()

        with dbc.cursor() as cursor:
            cursor.executemany.side_effect = Exception('Some DB error')

            with self.assertRaises(Exception) as exc:
                insert_rows(rows, 'users', dbc)

            calls = [mock.call.executemany(mock.ANY, mock.ANY),
                     mock.call.rollback(),]
            cursor.assert_has_calls(calls)
        self.assertTrue(dbc.autocommit)
        self.assertEqual('Some DB error', str(exc.exception))

Discussion

It might seem easier using real database, but it has drawbacks:

Actual database might not be available or you might not have connectivity to it.
You are not testing your code in isolation. If, for example, the database connection is unstable, you will start getting wired, unpredictable results.
Database requests add significant latency.
You need to reset the database state before each state to a fixed well-known state. This might be challenging, especially if the database is shared.

Looking at our function, we could see a smell. It does more than one thing:

Generate INSERT statement
Generate tuples list
Insert the tuples into the database

There are also some conditions that might have been handled better. What if I pass empty list of rows?

The good news is, we could refactor our code and make sure our code still works properly. We know we haven't broken anything. Just because we have thorough unit tests.

Refactoring - Extract function

Let's move the code for generating INSERT statement into a new function. We are going to follow these steps:

Run our tests and make sure they pass (green).
Create a test the new function.
Run our tests and make sure they fail (red).
Create a new function and copy the code we want to extract.
Run our tests and make sure they pass (green).
Replace the old code with a call to the new function.
Run our tests and make sure they still pass (green).

Let's implement the test:

    def test_make_insert_produces_correct_statement(self):

        fields = ['id', 'name']
        actual = make_insert_statement(fields, 'users')

        expected = 'INSERT INTO users(id, name) VALUES (?,?)'
        self.assertEqual(expected, actual)

Tests are failing now:

(.venv37) sandbox>python test_insert_rows.py
test_insert_rows_calls_cursor_method (__main__.Test_insert_rows) ... ok
test_insert_rows_calls_executemany_and_commit_passing_correct_arguments (__main__.Test_insert_rows) ... ok
test_insert_rows_rollsback_transaction_on_databse_exception (__main__.Test_insert_rows) ... ok
test_make_insert_produces_correct_statement (__main__.Test_insert_rows) ... ERROR

======================================================================
ERROR: test_make_insert_produces_correct_statement (__main__.Test_insert_rows)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:/Sandbox/Learn/Python/TestDrivenPythonDevelopment/play/sandbox/test_insert_rows.py", line 42, in test_make_insert_produces_correct_statement     
    actual = make_insert_statement(fields, 'users')
NameError: name 'make_insert_statement' is not defined

----------------------------------------------------------------------
Ran 4 tests in 0.017s

FAILED (errors=1)

Now we implement the function:

def make_insert_statement(field_names, table_name):
    field_names_str = ', '.join(field_names)
    placeholder_str = ','.join('?'*len(field_names))
    insert_sql = f'INSERT INTO {table_name}({field_names_str}) VALUES ({placeholder_str})'
    return insert_sql

Tests are passing.

We update the insert_rows function to call the new make_insert_statement function and run the tests to see them passing.

def insert_rows(rows, table_name, dbc):
    field_names = rows[0].keys()
    insert_sql = make_insert_statement(field_names, table_name)
    saved_autocommit = dbc.autocommit
    with dbc.cursor() as cursor:
        try:
            dbc.autocommit = False
            tuples = [ tuple((row[field_name] for field_name in field_names)) for row in rows ]
            cursor.executemany(insert_sql, tuples)
            cursor.commit()
        except Exception as exc:
            cursor.rollback()
            raise exc
        finally:
            dbc.autocommit = saved_autocommit

The new version of the function is not much shorter, but has some advantages:

improved readability - it is much easier to understand what the code is doing
improved reusability - it is very likely that we might need the INSERT statement generation in another situation
better testability - we could test the generation of the INSERT statement in isolation. Introducing new test cases for this functionality is easy. It doesn't require database connection, for example.

If we follow the principles of the test driven development (TDD), we should remove the check for the generated statement in the call to the insertmany. We could achieve this by patching the make_insert_statement function.

    @mock.patch('__main__.make_insert_statement')
    def test_insert_rows_calls_executemany_and_commit_passing_correct_arguments(self, make_insert_mock):
        dbc = self.fix_dbc()
        rows = self.fix_rows()

        make_insert_mock.return_value = 'MY PRECIOUS'

        insert_rows(rows, 'users', dbc)

        with dbc.cursor() as cursor:
            expect_sql = 'MY PRECIOUS'
            expect_tuples = self.fix_tuples()
            calls = [mock.call.executemany(expect_sql, expect_tuples),
                     mock.call.commit(),]
            cursor.assert_has_calls(calls)
        self.assertTrue(dbc.autocommit)

Wait! What is going here?

I used the patch decorator to replace the make_insert_statement with a mock object. The mock object is automatically added as second argument to my test method.

I have also defined that the make_insert_statement mock returns a fixed value MY PRECIOS. It is not valid SQL, but our mock database connection doesn't care. The important thing is that we see the result from the make_insert_statement passed to the executemany method.