The Basics of Unit Testing #

Unit Testing #

In Reading 1, we saw how docstrings can be used to describe what a function does, what inputs it takes, and what it returns. However, it is imperative to make sure your function actually does what your docstring says it does. Testing the behavior of code in a systematic way allows you to provide evidence to others, and to yourself, that your code works in the way that you think it does.

One way of doing this is to use unit tests, which are test programs designed to check that a small, specific piece of code (a “unit”) behaves in a specific way. In this reading, you will see a few techniques to how to think through unit tests, with some sample implementations. The sample code we use is written for use with the Pytest testing framework, which is a library for Python that allows you to write unit tests. The techniques we describe, however, can be applied in any mainstream testing framework.

Testing with Pytest #

This section contains specific details to get started working with Pytest.

The Structure and Meaning of a Unit Test #

In Pytest and many other frameworks, a unit test is simply a function that tests another function by calling it and checking that its behavior or output in response to a specific input is as expected.

For example, let’s say we have a function called average_value, which takes a list of integers as input and returns the mean value as a float. We can write a test case for average_value like this:

def test_average_value_123():
    """
    Check that the average of the simple list [1, 2, 3] is 2.0.
    """
    assert average_value([1, 2, 3]) == 2.0

Unit test functions in Pytest should generally have a name that follows a specific format; names starting with test_ are automatically interpreted as being unit test functions in Pytest.

You will also notice that we use a new keyword, assert. This can be used anywhere in Python code, but is usually used within unit tests. The assert keyword checks that what follows it evaluates to True and causes an error if it does not. The Pytest framework makes sure to handle this error so that your unit testing code does not crash if a function is not working properly.

This unit test function does not take any inputs or return anything. For now, if you write your own unit test functions, you should follow this format, though we will later see examples of unit test functions that do take input.

It is also worth noting what this unit test means. If a function fails a unit test, and the test has been written correctly, then you know with certainty that the function is behaving incorrectly.

However, if this unit test passes, it does not necessarily mean that average_value is implemented correctly for all cases - it does not even mean that average_value is implemented correctly for lists of integers. In fact, the only thing that you can conclude with absolute certainty from this unit test is that average_value works as intended specifically for the list [1, 2, 3]. So why are unit tests useful if they only tell us that a function works for a specific individual case?

By using a diverse range of unit test cases, you can gain more confidence that the function works as intended in general. For example, you could test that average_value works for lists of all negative numbers, lists where all numbers are the same, an empty list, and many more. These are all common patterns to test for that you can use to systematically design test cases. More on this later.

Coming up with these tests may even help you think through your expectation of what a function should do in a specific condition - for example, should average_value([]) crash with an error, and if not, what should it return?

Unit testing is a valuable tool in any programmer’s arsenal, but it also draws on a skillset that many software designers do not practice often enough. In fact, software companies often hire engineers specifically for software testing who are experts in finding ways that code can break. As you get more comfortable writing code and making mistakes in Python, you will gain some intuition for the errors that Python programmers make, and this will allow you to write unit tests that specifically check that these errors were not made. We hope that you get the chance to practice this craft in this course.

Test Files #

Let’s examine the average_value function a little closer. average_value is defined in a file called average.py. average_value takes a list of numbers, numbers, and returns its average as a float. You could write it (incorrectly) like this:

# This is defined in a file called `average.py`
def average_value(numbers):
    """
    Return the average of a list of numbers.

    Args:
        numbers: A list of numbers (ints or floats):

    Returns:
        A float representing the average value of the numbers in the list.
    """
    return sum(numbers) / 3

This function incorrectly divides the total sum of the list numbers by 3 instead of by the length of num_list. So how would we go about testing it?

Typically, the way to run unit tests is to create a separate testing file. This helps you keep your code and tests separate, which makes your files more readable. Similarly to functions, your unit test files in Pytest should have names that start with test_. For example, if the average_value function above were in a file called average.py, the unit tests should be in a file called test_average.py. Pytest assumes all files whose name starts with test_ or end with _test.py as contain unit tests.

Let’s say you wrote the following in test_average.py:

"""
test_average.py: Test cases for `average_value`.
"""

from average import average_value


def test_average_value_single_one():
    """
    Check that the average of a list of a single 1 is 1.0.
    """
    assert average_value([1]) == 1.0


def test_average_value_123():
    """
    Check that the average of the simple list [1, 2, 3] is 2.0.
    """
    assert average_value([1, 2, 3]) == 2.0


def test_average_value_triple_ones():
    """
    Check that the average of a list of three 1s is 1.0.
    """
    assert average_value([1, 1, 1]) == 1.0

In your unit test file, you need to be able to access the functions that you defined. You can do this by importing the function into your file. Importing is a concept that we will see in more detail later. In the example above, the line from average import average_value does the trick (assuming that average.py and test_average.py are in the same directory).

The from average indicates that you are accessing variables in average.py, and import average_value indicates that you are specifically accessing average_value. In the rest of this file, you can use average_value as defined in average.py.

Running Unit Tests #

To run unit tests in Pytest, you can run the command pytest from a command line within this directory, NOT a Python REPL.

Pytest looks at all files that match the pattern test_*.py or *_test.py. This is why we defined our tests in a test file earlier (in addition to separating tests and code for clarity). It then automatically detects and runs all relevant tests for you. With the example function and three tests above, running pytest gives you the following output (your Pytest and Python versions will likely be different):

============================= test session starts =============================
platform linux -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/softdes/pytest-example
plugins: anyio-3.6.1
collected 3 items

test_average.py F..                                                     [100%]

================================== FAILURES ===================================
________________________ test_average_value_single_one ________________________

    def test_average_value_single_one():
        """
        Check that the average of a list of a single 1 is indeed 1.0.
        """
>       assert average_value([1]) == 1.0
E       assert 0.3333333333333333 == 1.0
E        +  where 0.3333333333333333 = average_value([1])

test_average.py:12: AssertionError
=========================== short test summary info ===========================
FAILED test_average.py::test_average_value_single_one - assert 0.33333333333...
========================= 1 failed, 2 passed in 0.01s =========================

Don’t be discouraged when you see failed tests; rather, it is a good thing to see failed tests during the development process. That means you have found bugs in your program. In a class setting, failed tests are a learning opportunity. In the professional world, they save you headaches and trouble. It is much better to find bugs now rather than have angry customers complain to customer support that a certain feature doesn’t work.

Reading Pytest’s Output #

Let’s walk through this output. Pytest splits its output into sections. The main sections are delimited by lines of = signs and subsections are delimited by lines of _.

Test Session #

The first section is produced live as the tests are running. It starts with some information about the system you are running on (operating system, Python and Pytest version, directory you are running pytest from, etc).

After it gathers information about the system, Pytest collects all tests in your test files. In this example, collected 3 items tells us that there is a total of 3 test cases.

It then runs the test cases one by one for every test file in the directory. In this case, we only have test_average.py. It prints indicators showing the status of tests as it runs them. Failures are indicated by an F, while passing tests are indicated by .. The percentage on the right side is not a score; it is simply a progress marker. These are better illustrated in an example with more tests:

test_bar.py ..F..........F..F....F...F..F.F............F...... [ 34%]
..F..F..                                                       [ 39%]
test_foo.py .................................................. [ 73%]
................................                               [ 95%]
test_hello.py ......                                           [100%]

These indicators and the progress marker are nice as software projects can have thousands upon thousands of test cases. It’s nice to see how many still need to run (aka, do you have time to get a coffee).

FAILURES #

The FAILURES section contains the verbose details about each failing test case. Every failed case has its own subsection named after the failing test function. It prints the test function up to the failed assertion. Pytest indicates the line of failure with a > sign. In this case the failed line is assert average_value([1]) == 1.0.

The next lines (starting with E) show you additional details about the error. In this case, average_value([1]) evaluates to 0.3333.... Thus, the assertion that average_value([1]) == 1.0 is false because 0.3333... == 1.0 is false.

Finally, the failure has a summary test_average.py:12: AssertionError. In plain English, this would read: “in test_average.py at line 12 there is a false assertion.

Summary #

The last section provides a brief summary of the results. It tells you which tests failed, how many failed, how many passed, and how long it took to run. It then names the specific tests that failed and what file they are in.

Unit Testing Tips #

These are tips that work with any test framework for any programming language.

As we mentioned previously, the process of designing unit tests might feel unintuitive at first. Luckily, there are general guidelines you can follow to write good test cases. In addition, writing basic test cases can often be systematized. Following these tips does not guarantee that your code will be bug free, but it provides a solid start for writing good tests.

Clarity is Everything #

When writing unit tests, clarity is everything. What is meant by clarity is the point and meaning of the test should be understandable at a glance.

Use Descriptive Function Names #

When writing code, you typically try to keep your function names short while remaining fairly descriptive. This keeps your code readable when you call the function; you can get an idea of what the function does while not making your lines too long.

When you write unit tests, you shouldn’t worry about writing functions with long names. Test functions aren’t used again elsewhere, so their names can be as long and descriptive as you want - so long as the lines are still brief enough to match the max width of your style guide.

When naming test functions, it is good practice to include the name of the function you are testing and the “single conceptual fact” that the test covers. This is best illustrated through some examples. Using the average function from earlier:

def test_average_empty_list_is_zero():
    """
    Test that the average of the empty list is zero.
    """
    assert average([]) == 0.0


def test_average_singleton_list_is_singleton():
    """
    Test that the average of a list containing X is X.
    """
    assert average([5]) == 5.0


def test_average_pair_is_midpoint():
    """
    Test that the average of a list [X, Y] is the midpoint of X and Y.
    """
    assert average([1, 9]) == 5.0


def test_average_list_is_float():
    """
    Test that the return value of average on a list is a float.
    """
    result = average([3, 1, 4, 1, 5, 9, 2])
    assert isinstance(result, float)

From the above examples, the docstrings are written verbosely, but certain words like “of”, “and”, and “the” are omitted from the function names.

A Little bit of Redundancy is Okay in Tests #

When writing code, you’re often told “don’t repeat yourself” (DRY). This is to keep production code cleaner and easier to maintain.

For example, let’s say a streaming service frequently appends a video ID to a base URL. It would make much more sense to turn that into a function watch_url(id) as opposed to manually writing BASE_URL + "watch?v=" + id every single time. For example, what if the company wanted to change the name of the web request parameter from v to vid.

Unlike production code, repetition tends to be less problematic in unit tests. In some cases, repetition actually makes it harder to understand tests. This is because a unit test is “set and forget”; the test should always pass unless the actual behavior of the function is accidentally changed (e.g. after a refactoring with a minor oversight).

What a unit test does should be obvious at a glance. Overly abstracting the code using functions and constants defined in the test file makes it difficult to understand why an assertion was False without following a chain of function definitions.

Systematize Writing Tests: Consider Your Inputs and Outputs #

It is impossible to test your function for every possible case. However, input cases can be restricted to certain patterns. It is then feasible to test these patterns of basic cases.

For example, let’s say your function takes a string as input. Every function will be different, but if your input is a string, more often than not your basic cases will include: the empty string "", a string of length 1 "a", a string of length 2 "ab", and a longer string "abcd...". When viewing the input cases this way, "abcd" is the exact same string as "1234". Depending on your function, there may be more or less basic cases, but this is a good place to start.

It’s okay if your function doesn’t work in a case like this, but if so, you should make that expectation clear in your docstring by saying something like “takes a nonempty string” or “results in an error if given an empty string”.

As an exercise for the reader, try to think of general basic cases if your function takes a bool, an int, a float, and a list.

Multiple Arguments #

If your function contains multiple arguments, consider the combinations of these basic cases. You might be thinking, “wait, if I do this, doesn’t the amount of test cases exponentially increase?” You would be right, but you probably don’t have to cover every permutation.

For example, let’s say your function takes 4 strings as arguments (referred to as string_1, string_2, string_3, and string_4). You might be worried that with the 4 basic cases for string inputs described earlier, that’s \( 4^4 = 256 \) different cases!

Thankfully, you don’t need to write 256 cases. Let’s say you are testing that the function has the correct behavior if string_1 is the empty string. If you consider the permutations from earlier, you would have to write \(4^3 = 64 \) test cases to cover all the possibilities. Fortunately, chances are that if your code works with string_1 being empty and the other 3 strings as N length strings, the behavior regarding string_1 will still work correctly if string_1 is empty and the other 3 strings are any combination of the basic string cases. This means you only need to write 1 test case where string_1 is empty, not 64.

Looking at it this way, you’ve reduced the number of test cases required from 256 to 16! Of course, there still might be some special cases. For example, the function might have a special behavior if string_3 is empty and string_4 is not. In those cases, you will need to test more combinations, but you will more likely end up with 20 or 30 cases instead of 200.

Another useful trick to avoid writing excess unit tests is to look at checks in the code. For example, if there is an if statement that checks whether string_1 is empty, then tests checking the other strings can assume string_1 is non-empty.

Outputs #

Let’s say you have a function that takes a string for input and returns a list of strings. Consider what type of outputs the function should have. Maybe the basic output cases are an empty list, a list of one string, a list of two string, and a list of many strings. Considering output cases can help you find input cases you might have missed.

Conditional Coverage and Boundaries #

If your function has conditionals (if/elif/else), you should try to run tests so that every “path” through these conditionals is executed. In other words, write at least one test that causes the function to go through each of the if, elif, and else blocks. This helps you to check that there are likely no glaring errors in any of those blocks that would cause your function to only fail for some inputs.

Finally, if your function has a for or while loop, write tests that go through the loop as well as tests that do not, if possible. For example, if a while loop begins with while x > 10: and x is less than or equal to 10 when the program reaches the loop, the body of the loop will not execute at all. It is good practice to ensure that the function still works in this case.

Boundaries #

In one of your high school math classes, you might have been frustrated that you lost points on a homework problem for writing something like [0, 10] instead of (0, 10]. “Does not including 0 really make a difference?”

The answer is yes. One of the most common bugs in software is writing <= instead of < and vice versa. Boundaries are where if/else logic switches which block of code is run. Your code is most likely to be buggy when a parameter lies on the boundary.

It is arguably more important to write test cases at boundaries than to test what occurs in-between them.

As an example, let’s say you’re writing a function for a guess the number game. The goal of the game is to have the closest guess to the secret number without going over it. For example, if the number was 10 and the guesses were 5, 7, and 11, then 7 would be the winner as it is the closest to 10 without going over it. The function would take two arguments, an int representing a secret number, and a list of ints representing guesses. The boundary would be defined by the number. You would want to write test cases to ensure the boundary behavior is correct. For example:

# Test the last case that lies before the switch.
def test_number_game_on_boundary():
    assert number_game(10, [2, 10, 12]) == 10


# Test the first case that lies after the switch.
def test_number_game_after_boundary():
    assert number_game(10, [2, 11, 12]) == 2

These two test cases would check that the function number_game is using the correct logic to calculate the winning guess. If the person who wrote the function accidentally used < instead of <=, this case would catch it.

Test Individual Behaviors, not Functions #

Let’s say you have a function which prints a message and has a return value. For example:

def verbose_adder(num_1, num_2):
   """
   Add two numbers while printing a message.
   """
   print(f"Summing {num_1} and {num_2}")
   return num_1 + num_2

You might be tempted to test the function as follows:

def test_verbose_adder(capsys):
    result = verbose_adder(2, 3)
    assert result == 5

    captured = capsys.readouterr()
    assert captured.out == "Summing 2 and 3\n"

This has a few consequences. The purpose of this test is not entirely clear at a glance because in reality, it is two separate tests: 1, does it return the correct value and 2, does it print the correct message?

Second, let’s say that the first assertion proved to be false. Let’s pretend that verbose_adder was implemented incorrectly and uses * instead of +. You would get the following pytest output:

_____________________________ test_verbose_adder ______________________________

capsys = <_pytest.capture.CaptureFixture object at 0x7f422d384700>

    def test_verbose_adder(capsys):
        result = verbose_adder(2, 3)
>       assert result == 5
E       assert 6 == 5

test_verbose_add.py:6: AssertionError
---------------------------- Captured stdout call -----------------------------
Summing 2 and 3

First, notice how the second assertion is not printed in the Pytest output. The test automatically fails after the first assertion. In this example, it works out fine because the second assertion would turn out True. However, if it were also False, you wouldn’t find out until you fixed the first error.

Second, the Pytest output shows what capsys captured from standard out. This could mislead you into thinking your error is related to the printing, when it is in fact the return value.

Instead of writing test cases for a function as a whole, you should write tests for each behavior the function has. This keeps tests clear. It especially becomes important when you start writing object-oriented code and are testing methods (you will learn about these in Reading 5). A method can have many side effects, and thus testing each behavior of the method instead of the entire method becomes even more important.

Coming back to our verbose_adder function, here is an example of how to split it into two test cases:

def test_verbose_adder_returns_sum(capsys):
    """
    Test that verbose_adder returns the sum of two numbers.
    """
    assert verbose_adder(2, 3) == 5


def test_verbose_adder_prints_message(capsys):
    """
    Test that verbose_adder prints a message when summing.
    """
    verbose_adder(2, 3)
    captured = capsys.readouterr()
    assert captured.out == "Summing 2 and 3\n"