The Basics of Unit Testing #
Unit Testing #
In Reading 1, we saw how docstrings can be used to describe what a function does, what inputs it takes, and what it returns. However, it is imperative to make sure your function actually does what your docstring says it does. Testing the behavior of code in a systematic way allows you to provide evidence to others, and to yourself, that your code works in the way that you think it does.
One way of doing this is to use unit tests, which are test programs designed to check that a small, specific piece of code (a “unit”) behaves in a specific way. In this reading, you will see a few techniques to how to think through unit tests, with some sample implementations. The sample code we use is written for use with the Pytest testing framework, which is a library for Python that allows you to write unit tests. The techniques we describe, however, can be applied in any mainstream testing framework.
Testing with Pytest #
This section contains specific details to get started working with Pytest.
The Structure and Meaning of a Unit Test #
In Pytest and many other frameworks, a unit test is simply a function that tests another function by calling it and checking that its behavior or output in response to a specific input is as expected.
For example, let’s say we have a function called average_value
, which takes a
list of integers as input and returns the mean value as a float. We can write a
test case for average_value
like this:
def test_average_value_123():
"""
Check that the average of the simple list [1, 2, 3] is 2.0.
"""
assert average_value([1, 2, 3]) == 2.0
Unit test functions in Pytest should generally have a name that follows a
specific format; names starting with test_
are automatically interpreted as
being unit test functions in Pytest.
You will also notice that we use a new keyword, assert
. This can be used
anywhere in Python code, but is usually used within unit tests. The assert
keyword checks that what follows it evaluates to True
and causes an error if
it does not. The Pytest framework makes sure to handle this error so that your
unit testing code does not crash if a function is not working properly.
This unit test function does not take any inputs or return anything. For now, if you write your own unit test functions, you should follow this format, though we will later see examples of unit test functions that do take input.
It is also worth noting what this unit test means. If a function fails a unit test, and the test has been written correctly, then you know with certainty that the function is behaving incorrectly.
However, if this unit test passes, it does not necessarily mean that
average_value
is implemented correctly for all cases - it does not even mean
that average_value
is implemented correctly for lists of integers. In fact,
the only thing that you can conclude with absolute certainty from this unit
test is that average_value
works as intended specifically for the list [1, 2, 3]
. So why are unit tests useful if they only tell us that a function
works for a specific individual case?
By using a diverse range of unit test cases, you can gain more confidence that
the function works as intended in general. For example, you could test that
average_value
works for lists of all negative numbers, lists where all
numbers are the same, an empty list, and many more. These are all common
patterns to test for that you can use to systematically design test cases. More
on this later.
Coming up with these tests may even help you think through your expectation of
what a function should do in a specific condition - for example, should
average_value([])
crash with an error, and if not, what should it return?
Unit testing is a valuable tool in any programmer’s arsenal, but it also draws on a skillset that many software designers do not practice often enough. In fact, software companies often hire engineers specifically for software testing who are experts in finding ways that code can break. As you get more comfortable writing code and making mistakes in Python, you will gain some intuition for the errors that Python programmers make, and this will allow you to write unit tests that specifically check that these errors were not made. We hope that you get the chance to practice this craft in this course.
Test Files #
Let’s examine the average_value
function a little closer. average_value
is
defined in a file called average.py
. average_value
takes a list of numbers,
numbers
, and returns its average as a float. You could write it (incorrectly)
like this:
# This is defined in a file called `average.py`
def average_value(numbers):
"""
Return the average of a list of numbers.
Args:
numbers: A list of numbers (ints or floats):
Returns:
A float representing the average value of the numbers in the list.
"""
return sum(numbers) / 3
This function incorrectly divides the total sum of the list numbers by 3 instead
of by the length of num_list
. So how would we go about testing it?
Typically, the way to run unit tests is to create a separate testing file. This
helps you keep your code and tests separate, which makes your files more
readable. Similarly to functions, your unit test files in Pytest should have
names that start with test_
. For example, if the average_value
function
above were in a file called average.py
, the unit tests should be in a file
called test_average.py
. Pytest assumes all files whose name starts with
test_
or end with _test.py
as contain unit tests.
Let’s say you wrote the following in test_average.py
:
"""
test_average.py: Test cases for `average_value`.
"""
from average import average_value
def test_average_value_single_one():
"""
Check that the average of a list of a single 1 is 1.0.
"""
assert average_value([1]) == 1.0
def test_average_value_123():
"""
Check that the average of the simple list [1, 2, 3] is 2.0.
"""
assert average_value([1, 2, 3]) == 2.0
def test_average_value_triple_ones():
"""
Check that the average of a list of three 1s is 1.0.
"""
assert average_value([1, 1, 1]) == 1.0
In your unit test file, you need to be able to access the functions that you
defined. You can do this by importing the function into your file. Importing
is a concept that we will see in more detail later. In the example above, the
line from average import average_value
does the trick (assuming that
average.py
and test_average.py
are in the same directory).
The from average
indicates that you are accessing variables in average.py
,
and import average_value
indicates that you are specifically accessing
average_value
. In the rest of this file, you can use average_value
as
defined in average.py
.
Running Unit Tests #
To run unit tests in Pytest, you can run the command pytest
from a command
line within this directory, NOT a Python REPL.
Pytest looks at all files that match the pattern test_*.py
or *_test.py
.
This is why we defined our tests in a test file earlier (in addition to
separating tests and code for clarity). It then automatically detects and runs
all relevant tests for you. With the example function and three tests above,
running pytest
gives you the following output (your Pytest and Python
versions will likely be different):
============================= test session starts =============================
platform linux -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/softdes/pytest-example
plugins: anyio-3.6.1
collected 3 items
test_average.py F.. [100%]
================================== FAILURES ===================================
________________________ test_average_value_single_one ________________________
def test_average_value_single_one():
"""
Check that the average of a list of a single 1 is indeed 1.0.
"""
> assert average_value([1]) == 1.0
E assert 0.3333333333333333 == 1.0
E + where 0.3333333333333333 = average_value([1])
test_average.py:12: AssertionError
=========================== short test summary info ===========================
FAILED test_average.py::test_average_value_single_one - assert 0.33333333333...
========================= 1 failed, 2 passed in 0.01s =========================
Don’t be discouraged when you see failed tests; rather, it is a good thing to see failed tests during the development process. That means you have found bugs in your program. In a class setting, failed tests are a learning opportunity. In the professional world, they save you headaches and trouble. It is much better to find bugs now rather than have angry customers complain to customer support that a certain feature doesn’t work.
Reading Pytest’s Output #
Let’s walk through this output. Pytest splits its output into sections. The
main sections are delimited by lines of =
signs and subsections are delimited
by lines of _
.
Test Session #
The first section is produced live as the tests are running. It starts with
some information about the system you are running on (operating system, Python
and Pytest version, directory you are running pytest
from, etc).
After it gathers information about the system, Pytest collects all tests in
your test files. In this example, collected 3 items
tells us that there is a
total of 3 test cases.
It then runs the test cases one by one for every test file in the directory. In
this case, we only have test_average.py
. It prints indicators showing the
status of tests as it runs them. Failures are indicated by an F
, while
passing tests are indicated by .
. The percentage on the right side is not
a score; it is simply a progress marker. These are better illustrated in an
example with more tests:
test_bar.py ..F..........F..F....F...F..F.F............F...... [ 34%]
..F..F.. [ 39%]
test_foo.py .................................................. [ 73%]
................................ [ 95%]
test_hello.py ...... [100%]
These indicators and the progress marker are nice as software projects can have thousands upon thousands of test cases. It’s nice to see how many still need to run (aka, do you have time to get a coffee).
FAILURES #
The FAILURES
section contains the verbose details about each failing test
case. Every failed case has its own subsection named after the failing test
function. It prints the test function up to the failed assertion. Pytest
indicates the line of failure with a >
sign. In this case the failed line is
assert average_value([1]) == 1.0
.
The next lines (starting with E
) show you additional details about the error.
In this case, average_value([1])
evaluates to 0.3333...
. Thus, the
assertion that average_value([1]) == 1.0
is false because 0.3333... == 1.0
is false.
Finally, the failure has a summary test_average.py:12: AssertionError
. In
plain English, this would read: “in test_average.py
at line 12 there is a
false assertion.
Summary #
The last section provides a brief summary of the results. It tells you which tests failed, how many failed, how many passed, and how long it took to run. It then names the specific tests that failed and what file they are in.
Unit Testing Tips #
These are tips that work with any test framework for any programming language.
As we mentioned previously, the process of designing unit tests might feel unintuitive at first. Luckily, there are general guidelines you can follow to write good test cases. In addition, writing basic test cases can often be systematized. Following these tips does not guarantee that your code will be bug free, but it provides a solid start for writing good tests.
Clarity is Everything #
When writing unit tests, clarity is everything. What is meant by clarity is the point and meaning of the test should be understandable at a glance.
Use Descriptive Function Names #
When writing code, you typically try to keep your function names short while remaining fairly descriptive. This keeps your code readable when you call the function; you can get an idea of what the function does while not making your lines too long.
When you write unit tests, you shouldn’t worry about writing functions with long names. Test functions aren’t used again elsewhere, so their names can be as long and descriptive as you want - so long as the lines are still brief enough to match the max width of your style guide.
When naming test functions, it is good practice to include the name of the function you are testing and the “single conceptual fact” that the test covers. This is best illustrated through some examples. Using the average function from earlier:
def test_average_empty_list_is_zero():
"""
Test that the average of the empty list is zero.
"""
assert average([]) == 0.0
def test_average_singleton_list_is_singleton():
"""
Test that the average of a list containing X is X.
"""
assert average([5]) == 5.0
def test_average_pair_is_midpoint():
"""
Test that the average of a list [X, Y] is the midpoint of X and Y.
"""
assert average([1, 9]) == 5.0
def test_average_list_is_float():
"""
Test that the return value of average on a list is a float.
"""
result = average([3, 1, 4, 1, 5, 9, 2])
assert isinstance(result, float)
From the above examples, the docstrings are written verbosely, but certain words like “of”, “and”, and “the” are omitted from the function names.
A Little bit of Redundancy is Okay in Tests #
When writing code, you’re often told “don’t repeat yourself” (DRY). This is to keep production code cleaner and easier to maintain.
For example, let’s say a streaming service frequently appends a video ID to a
base URL. It would make much more sense to turn that into a function
watch_url(id)
as opposed to manually writing BASE_URL + "watch?v=" + id
every single time. For example, what if the company wanted to change the name
of the web request parameter from v
to vid
.
Unlike production code, repetition tends to be less problematic in unit tests. In some cases, repetition actually makes it harder to understand tests. This is because a unit test is “set and forget”; the test should always pass unless the actual behavior of the function is accidentally changed (e.g. after a refactoring with a minor oversight).
What a unit test does should be obvious at a glance. Overly abstracting the
code using functions and constants defined in the test file makes it
difficult to understand why an assertion was False
without following a chain
of function definitions.
Systematize Writing Tests: Consider Your Inputs and Outputs #
It is impossible to test your function for every possible case. However, input cases can be restricted to certain patterns. It is then feasible to test these patterns of basic cases.
For example, let’s say your function takes a string as input. Every function
will be different, but if your input is a string, more often than not your
basic cases will include: the empty string ""
, a string of length 1 "a"
, a
string of length 2 "ab"
, and a longer string "abcd..."
. When viewing the
input cases this way, "abcd"
is the exact same string as "1234"
. Depending
on your function, there may be more or less basic cases, but this is a good
place to start.
It’s okay if your function doesn’t work in a case like this, but if so, you should make that expectation clear in your docstring by saying something like “takes a nonempty string” or “results in an error if given an empty string”.
As an exercise for the reader, try to think of general basic cases if your
function takes a bool
, an int
, a float
, and a list
.
Multiple Arguments #
If your function contains multiple arguments, consider the combinations of these basic cases. You might be thinking, “wait, if I do this, doesn’t the amount of test cases exponentially increase?” You would be right, but you probably don’t have to cover every permutation.
For example, let’s say your function takes 4 strings as arguments (referred to
as string_1
, string_2
, string_3
, and string_4
). You might be worried
that with the 4 basic cases for string inputs described earlier, that’s
\( 4^4 = 256 \)
different cases!
Thankfully, you don’t need to write 256 cases. Let’s say you are testing that
the function has the correct behavior if string_1
is the empty string. If you
consider the permutations from earlier, you would have to write
\(4^3 = 64 \)
test cases to cover all the possibilities. Fortunately,
chances are that if your code works with string_1
being empty and the other 3
strings as N length strings, the behavior regarding string_1
will still work
correctly if string_1
is empty and the other 3 strings are any combination of
the basic string cases. This means you only need to write 1 test case where
string_1
is empty, not 64.
Looking at it this way, you’ve reduced the number of test cases required from
256 to 16! Of course, there still might be some special cases. For example, the
function might have a special behavior if string_3
is empty and string_4
is
not. In those cases, you will need to test more combinations, but you will more
likely end up with 20 or 30 cases instead of 200.
Another useful trick to avoid writing excess unit tests is to look at checks in
the code. For example, if there is an if statement that checks whether
string_1
is empty, then tests checking the other strings can assume
string_1
is non-empty.
Outputs #
Let’s say you have a function that takes a string for input and returns a list of strings. Consider what type of outputs the function should have. Maybe the basic output cases are an empty list, a list of one string, a list of two string, and a list of many strings. Considering output cases can help you find input cases you might have missed.
Conditional Coverage and Boundaries #
If your function has conditionals (if
/elif
/else
), you should try to run
tests so that every “path” through these conditionals is executed. In other
words, write at least one test that causes the function to go through each of
the if
, elif
, and else
blocks. This helps you to check that there are
likely no glaring errors in any of those blocks that would cause your function
to only fail for some inputs.
Finally, if your function has a for
or while
loop, write tests that go
through the loop as well as tests that do not, if possible. For example, if a
while
loop begins with while x > 10:
and x
is less than or equal to 10
when the program reaches the loop, the body of the loop will not execute at all.
It is good practice to ensure that the function still works in this case.
Boundaries #
In one of your high school math classes, you might have been frustrated that you lost points on a homework problem for writing something like [0, 10] instead of (0, 10]. “Does not including 0 really make a difference?”
The answer is yes. One of the most common bugs in software is writing <=
instead of <
and vice versa. Boundaries are where if
/else
logic switches
which block of code is run. Your code is most likely to be buggy when a
parameter lies on the boundary.
It is arguably more important to write test cases at boundaries than to test what occurs in-between them.
As an example, let’s say you’re writing a function for a guess the number game.
The goal of the game is to have the closest guess to the secret number without
going over it. For example, if the number was 10 and the guesses were 5, 7, and
11, then 7 would be the winner as it is the closest to 10 without going over
it. The function would take two arguments, an int
representing a secret
number, and a list of int
s representing guesses. The boundary would be
defined by the number. You would want to write test cases to ensure the
boundary behavior is correct. For example:
# Test the last case that lies before the switch.
def test_number_game_on_boundary():
assert number_game(10, [2, 10, 12]) == 10
# Test the first case that lies after the switch.
def test_number_game_after_boundary():
assert number_game(10, [2, 11, 12]) == 2
These two test cases would check that the function number_game
is using the
correct logic to calculate the winning guess. If the person who wrote the
function accidentally used <
instead of <=
, this case would catch it.
Test Individual Behaviors, not Functions #
Let’s say you have a function which prints a message and has a return value. For example:
def verbose_adder(num_1, num_2):
"""
Add two numbers while printing a message.
"""
print(f"Summing {num_1} and {num_2}")
return num_1 + num_2
You might be tempted to test the function as follows:
def test_verbose_adder(capsys):
result = verbose_adder(2, 3)
assert result == 5
captured = capsys.readouterr()
assert captured.out == "Summing 2 and 3\n"
This has a few consequences. The purpose of this test is not entirely clear at a glance because in reality, it is two separate tests: 1, does it return the correct value and 2, does it print the correct message?
Second, let’s say that the first assertion proved to be false. Let’s pretend
that verbose_adder
was implemented incorrectly and uses *
instead of +
.
You would get the following pytest output:
_____________________________ test_verbose_adder ______________________________
capsys = <_pytest.capture.CaptureFixture object at 0x7f422d384700>
def test_verbose_adder(capsys):
result = verbose_adder(2, 3)
> assert result == 5
E assert 6 == 5
test_verbose_add.py:6: AssertionError
---------------------------- Captured stdout call -----------------------------
Summing 2 and 3
First, notice how the second assertion is not printed in the Pytest
output. The test automatically fails after the first assertion. In this
example, it works out fine because the second assertion would turn out
True
. However, if it were also False
, you wouldn’t find out until you
fixed the first error.
Second, the Pytest output shows what capsys
captured from standard out. This
could mislead you into thinking your error is related to the printing, when it
is in fact the return value.
Instead of writing test cases for a function as a whole, you should write tests for each behavior the function has. This keeps tests clear. It especially becomes important when you start writing object-oriented code and are testing methods (you will learn about these in Reading 5). A method can have many side effects, and thus testing each behavior of the method instead of the entire method becomes even more important.
Coming back to our verbose_adder
function, here is an example of how to
split it into two test cases:
def test_verbose_adder_returns_sum(capsys):
"""
Test that verbose_adder returns the sum of two numbers.
"""
assert verbose_adder(2, 3) == 5
def test_verbose_adder_prints_message(capsys):
"""
Test that verbose_adder prints a message when summing.
"""
verbose_adder(2, 3)
captured = capsys.readouterr()
assert captured.out == "Summing 2 and 3\n"