2. Cleaner Code, Data Structures, and Functions

Reading 2: Cleaner Code, Data Structures, and Functions #

The first part of this reading introduces a few new features that will make it easier to write some types of code in Python. In Reading 1, we avoided introducing these to let you get used to a simpler syntax, but now that you have some experience, we will explain how to simplify some of your common coding patterns.

The rest of this reading focuses on data structures and functions, two important building blocks that are part of nearly every Python program in common use. In short, data structures are a way of packaging data, while functions are a way of packaging code. Both of these make it easier to reason about and write programs.

Writing Cleaner Code #

Arithmetic Assignment Operators #

You may have noticed that in most while loops you write that some variable needs to be updated in each loop. Something like:

>>> i = 3
>>> while i > 0:
...    print(i)
...    i = i - 1

Instead of writing i = i - 1, you can write i -= 1. In fact, you can do this with any arithmetic operator on a numeric variable:

>>> x = 10
>>> x += 5
>>> x
>>> x //= 3
>>> x
>>> x **= 2
>>> x

The += operator also work with strings:

>>> s = "sp"
>>> s += "am"
>>> s

Type Conversion #

You may have found it a bit frustrating that you cannot combine a string and integer so that by writing something like "I was born in " + 2000, you could get the string "I was born in 2000". You can display this using print("I was born in", 2000), but you may want the actual string.

One way to do this is with type conversion. By using the type name you want to convert to and parentheses (()), you can turn a variable into a different type:

>>> str(2000)
>>> int("123")
>>> float(42)

This will not work for all types we will see in this course, but can be done for all of the types you have seen so far (int, float, bool, and str).

String Formatting #

While you can use type conversion to make printing easier, there is a much cleaner way to print strings like the above. You can use string interpolation, which allows you to use and format the values of variables in a string. The newest and recommended way to do string interpolation is called the f-string, for a reason that is clear in the example below:

>>> birth_year = 2000
>>> print(f"I was born in {birth_year}")
I was born in 2000

As you can see, by writing an f before the starting quote in the string, you can use curly braces ({}) to surround a Python expression and use its value in the string. This does not have to be a variable name - you could write something like {birth_year // 2} if you wanted to.

As a historical note, older ways to do string interpolation include the following:

"I was born in {}".format(birth_year)
"I was born in %d" % birth_year

You may see these older formats on Q&A sites, but we recommend avoiding using them, as they are more verbose and generally have slightly worse performance than f-strings.

Break & Continue #

You may have wondered if there are ways to end a for or while loop early. For example, let’s say you have string of text and want to print only the first sentence. You aren’t able to use a for loop, since you don’t know the index where you have to stop. Instead, you could use a while loop like this:

>>> text = "I am Sam. Sam I am. That Sam-I-am!"
>>> i = 0
>>> first_sentence = ""
>>> while text[i] != ".":
...     first_sentence += text[i]
...     i += 1 
>>> print(first_sentence, text[i])
I am Sam.

However, with the use of break, we can use a for loop. The break keyword is used to end the current loop and go on to the next section of code. Rewriting the above code:

>>> text = "I am Sam. Sam I am. That Sam-I-am!"
>>> first_sentence = ""
>>> for character in text:
...     first_sentence += text[i]
...     if character == ".":
...         break
>>> print(first_sentence)
I am Sam.

Similarly, let’s consider if you only wanted to skip a certain cycle or iteration of a loop. The keyword continue allows you to do just that. Now, if we wanted to print remove all spaces from a string:

>>> text = "I am Sam. Sam I am. That Sam-I-am!"
>>> no_spaces = ""
>>> for character in text:
...     if character == " ":
...         continue
...     no_spaces += character
>>> print(no_spaces)

Checking For Existence #

Up until now, you’ve probably been checking to see if a string is empty like this:

empty_string = ""
if empty_string == "":
    print("String is empty.")

While this is a completed valid way to write this check, Python officially recommends simply using if (or if not) and the name of the variable, like this:

empty_string = ""
if not empty_string:
    print("String is empty.")

In practice, you will most likely use this fact to write something where you execute code certain code if it is not empty:

if example_string:
    # Do stuff with example_string, and you know it isn't empty

A Type for Nothing #

Sometimes, you may want to have a value that represents nothing at all. For example, suppose you have a service where users can send password-protected messages to each other, and blank passwords (simply hitting Enter, essentially) are allowed. To differentiate the case of a message having a blank password ("") from one having no password at all, you can use a special value called None.

Another common example is if you are looking at a sequence of integer values and you need to keep the highest one. The integer values you see may all be negative, so setting a default value of 0 may not always work. In this case, you should set the initial maximum value to None.

In this case, though, you will need to check that a variable is equal to None. The syntax for checking equality to None is slightly different - rather than using == or !=, you use is or is not:

if max_value is None:
    # Set the maximum value

Data Structures #

You can think of a data structure as a way of organizing data for a particular purpose. In the real world, you can organize the exact same data in different ways or for different purposes. For example, how you organize the names Alice, Bob, Charlie, and David depends on whether you are creating a directory (alphabetically), ranking them by the number of points scored in a game (where you also need to keep track of those points), or creating a guest list (where you only want to check whether they are on the list or not).

In Python, data structures are also types, which means that they define a set of things you can do with the data they contain. As you might guess, the type of data structure you use will depend on how you intend to use the data.

Below, we will describe a few common data structures, what you can do with them, and how they are commonly used in programs.

Lists #

A list represents a sequence of items. You can define a list with square brackets ([]) and items separated by commas (,):

sample_list = [3, 1, 4, 1, 5, 9]  # Digits of pi
empty_list = []  # This has no items in it

It is easier to reason about lists in which each item has the same type, but you can mix and match items of different types within a list.

Like with strings, you can get the length of a list with len and get items within the list using indexing or slicing:

>>> len(sample_list)
>>> sample_list[2]
>>> sample_list[:3]
[3, 1, 4]

Also as with strings, you can concatenate and multiply lists:

>>> group_1 = ["Alice", "Bob"]
>>> group_2 = ["Charlie", "David"]
>>> group_1 + group_2
["Alice", "Bob", "Charlie", "David"]
>>> group_1 * 2
["Alice", "Bob", "Alice", "Bob"]

You can also iterate through each item of a list with a for loop:

for digit in sample_list:

A unique features of lists is that you can make modifications to them. For example, you can assign to an individual element in the list:

>>> passengers = ["Alice", "Bob", "Charlie", "David"]
>>> passengers[2] = "Eleanor"
>>> passengers
['Alice', 'Bob', 'Eleanor', 'David']

You can also append to an existing list, which adds a single item to the end of the list.

>>> sample_list = [3, 1, 4, 1, 5, 9]
>>> sample_list.append(2)
>>> sample_list
[3, 1, 4, 1, 5, 9, 2]

This is different from adding [2] to sample_list, which does not change sample_list:

>>> sample_list + [2]
[3, 1, 4, 1, 5, 9, 2]
>>> sample_list
[3, 1, 4, 1, 5, 9]

Here is an example program that takes a list of numbers and sorts them into two new lists of positive numbers and negative numbers:

>>> number_list = [1, 3.2, -4, -0.5, 1, -10, 42, -1, -7]
>>> positives = []
>>> negatives = []
>>> for num in number_list:
...     if num > 0:
...         positives.append(num)
...     else:
...         negatives.append(num)
>>> print(positives)
[1, 3.2, 1, 42]
>>> print(negatives)
[-4, -0.5, -10, -1, -7]

Ranges #

It is quite common in Python programs to do something for each number from 0 to n. Rather than defining a long list of integers, you can do this using ranges. The typical way of using ranges is like this:

# This prints all numbers from 0 to 9, each on its own line.
for i in range(10):

You can use range with one integer as shown above, with two integers (such as range(1, 11)), or with three integers (such as range(1, 12, 2)). The results are very similar to how string slicing works:

range(5)  # Essentially [0, 1, 2, 3, 4]
range(1, 6)  # Essentially [1, 2, 3, 4, 5]
range(1, 6, 2)  # Essentially [1, 3, 5]
range(5, 0, -1)  # Essentially [5, 4, 3, 2, 1]

We say that these ranges are essentially equivalent to lists because if you iterate through them with a for loop, the effect will be the same. But for reasons that we will not go into here, ranges are not lists, and you cannot use most of the list operators (such as + or append) with ranges. (You can use len to get the length of a range, though.)

You can also use ranges to do something a certain number of times. For example, the following code prints Hello! ten times in a row:

for _ in range(10):

You may find it strange that in this for loop, we use _ as the variable. Remember that _ is a valid variable name, and in most Python programs, it is used for a variable whose value is ignored. Specifically, in this for loop, we are simply printing Hello! and do not need the value of any of the integers from the range. Using _ makes this intent clear.

Dictionaries #

In its most basic form, an English dictionary allows you to look for a specific word and find its definition. Similarly, a dictionary in Python allows you to associate pairs of data. You can then “look up” one member of the pair to get the other member. As an example, here is a dictionary that maps integers to their English word:

number_words = {1: "one", 2: "two", 3: "three"}

As you can see, you use curly braces ({ and }) to surround the pairs and use the colon (:) to connect a pair of items (like the integer 1 and the string "one").

We would say that this dictionary maps 1 to "one". In this dictionary, 1, 2, and 3 are called keys, while "one", "two", and "three" are called values.

You can find the value corresponding to the key 1 like this:

>>> number_words[1]

This lookup only goes in one direction: you cannot get a key like 1 by running number_words["one"]. As with lists, you can mix and match the types of both keys and values, but it is easier to reason about a dictionary where all the keys and values are of the same type.

If you look up a key that is not in a dictionary, you will get an error (called KeyError):

>>> number_words[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 0

You can avoid this error by first checking if the key is mapped to a value using in, like this:

>>> 0 in number_words
>>> 1 in number_words

It is simple to add entries to a dictionary:

>>> number_words[0] = "zero"
>>> number_words
{1: 'one', 2: 'two', 3: 'three', 0: 'zero'}

You can only have one of any key in a dictionary. If you assign a value to a key already in a dictionary, you will overwrite its previous value:

>>> number_words[1] = "ONE"
>>> number_words
{1: 'ONE', 2: 'two', 3: 'three', 0: 'zero'}

As a quick example for how we can use dictionaries, here is a short program to count the occurrences of each letter that appears in a string:

sample_string = "hello world"
letter_counts = {}  # Empty dictionary
for character in sample_string:
    if character in letter_counts:
        letter_counts[character] += 1
        letter_counts[character] = 1
# letter_counts is as follows:
# {'h': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1}

You can get the number of entries in a dictionary with len. You can also use a for loop to go through all of the keys of a dictionary:

# This loop prints out "one", "two", "three" on separate lines.
number_words = {1: "one", 2: "two", 3: "three"}
for number in number_words:

You will loop through the keys in the order in which they were added to the dictionary.

If for some reason you need to delete a key from a dictionary, you can do that with del:

>>> number_words = {1: "one", 2: "two", 3: "three"}
>>> del number_words[2]
>>> number_words
{1: 'one', 3: 'three'}

Finally, note that there are some restrictions on what types you can use as keys in dictionaries. Of the types you have seen so far, lists and dictionaries are the only two types that you cannot use as dictionary keys:

>>> {[1, 2]: "one, two"}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

That being said, there is an alternative if you want to use something like a list as a dictionary key, which we will describe next.

Tuples #

Tuples are essentially lists that you cannot change (i.e., you cannot assign to individual items or append to them). They are written slightly differently from lists, using parentheses (()) around comma-separated items:

sample_tuple = (1, 2, 3, 4, 5)

Their behavior is nearly identical to that of lists - you can add tuples, multiply them by integers, loop through them, find their length, etc.

One hiccup with this syntax is that declaring a tuple with just one item in it requires a comma:

>>> (1)  # This is an int
>>> (1,)  # This is a tuple

A big advantage of tuples is that they can be used as dictionary keys:

>>> tuple_dict = {(1, 2): "one, two"}
>>> tuple_dict[(1, 2)]
'one, two'

If you know that you are dealing with a sequence of a specific size - for example, if you know that your data represents a pair of x and y coordinates

  • it may be advantageous to use a tuple. Doing so prevents you from accidentally appending something to the coordinates.

Sets #

A set is a data structure designed to test for membership (whether something is in the set or not). Creating a set is similar to creating a dictionary, except that you use single items instead of colon-separated pairs:

hubs = {"Chicago", "Denver", "Houston", "Los Angeles", "Newark",
        "San Francisco", "Washington"}

Note that an empty set cannot be defined with {}, since that represents an empty dictionary. To define an empty set, use set() instead.

In some ways, sets behave similarly to lists. You can find the size of a set with len and you can check whether something is in a set with in.

However, sets do not keep track of the order of the items they contain. You cannot index or slice a list, and duplicate items are not allowed. You can loop through the items of a set, but you might be surprised by the order:

>>> shuffled_nums = {3, 2, 4, 5, 1}
>>> for num in shuffled_nums:
...     print(num)

On the other hand, you can do some operations on sets that you cannot on lists. You can combine sets to get all of the items that appear in both sets (“and”), in either set (“or”), or in just one of the sets (“xor”, pronounced “ex or”):

>>> first_nums = {1, 2, 3, 4, 5}
>>> first_primes = {2, 3, 5, 7, 11}
>>> first_nums & first_primes  # Numbers in both sets (and)
{2, 3, 5}
>>> first_nums | first_primes  # Numbers in at least one set (or)
{1, 2, 3, 4, 5, 7, 11}
>>> first_nums ^ first_primes  # Numbers in only one set (xor)
{1, 4, 7, 11}

You can add and remove individual items from sets using add, remove, or discard:

>>> sample_set = {1, 2, 3}
>>> sample_set.add(4)  # sample_set is now {1, 2, 3, 4}
>>> sample_set.remove(2)  # sample_set is now {1, 3, 4}
>>> sample_set.remove(5)  # This will result in a KeyError
>>> sample_set.discard(3)  # sample_set is now {1, 4}
>>> sample_set.discard(5)  # No error, but doesn't remove anything

You can also “subtract” two sets, which will delete any items in the first set that are in the second (but ignore anything that is only in the second set):

>>> first_nums = {1, 2, 3, 4, 5}
>>> first_primes = {2, 3, 5, 7, 11}
>>> first_nums - first_primes
{1, 4}

As you can see, 2, 3, and 5 were removed, but 7 and 11 were not in first_nums to begin with and are thus ignored.

Finally, here is an example of a short program that counts the unique characters in a string and prints the result:

hello = "Hello world!"
unique_chars = set()
for character in hello:
print(f"Characters used in string: {unique_chars}")

This prints the following:

Characters used in string: {'!', 'd', 'w', 'o', 'e', 'r', 'H', 'l', ' '}

Concise Data Structures with Comprehensions #

Comprehensions are not data structures of their own, but can make it easier to define certain kinds of data structures.

Suppose you wanted to create a list consisting of the numbers 0 to 99. You could type out the list [0, 1, ..., 99], but this would be quite tedious. You could instead use a for loop and repeatedly add to a list:

counting_up = []
for i in range(100):

But there is a much more efficient and clean way to write this:

counting_up = [i for i in range(100)]

This is called a comprehension (and in this case specifically, a list comprehension). You can also use them for dictionaries. Here is a dictionary comprehension that maps each integer from 0 to 99 to its square:

squares = {i: i ** 2 for i in range(100)}

As you can see, the syntax is similar - the left part of the comprehension defines how to use i, while the right part of the comprehension describes the specific values of i to use.

The right part of the comprehension does not need to be a range - any data type that supports for looping, such as a list or dictionary, will also work.

You can combine a comprehension with an if statement to filter out certain values. For example, suppose you wanted to define a list of all integers under 1000 that are not divisible by 5. You can write the following to get that list:

[i for i in range(100) if i % 5 != 0]

Comprehensions can be quite powerful, but it is also important to be careful when using them. They can make your code difficult for others to read and understand, so we recommend only using them for fairly simple conditions, like what you see above.

For example, suppose you had a long list of words called word_list and you wanted to print the first three characters of all of the words in this list starting with J, Q, or X. You could write this as a comprehension like this:

[word[:3] for word in word_list if word[0] == "J" or word[0] == "Q"
                                   or word[0] == "X"]

But this is a bit convoluted to read, and you would probably be off writing this as a for loop, which is longer but easier to understand.

Functions #

You can think of functions as organizing code for a particular purpose. Using functions makes it easier to reuse code in different contexts and with different inputs.Learning about functions is the first example you will see of the principle we call DRY: Don’t Repeat Yourself. If you can package and reuse code rather than writing again, your code will be easier to understand, debug, and maintain.

Defining and Calling a Function #

Suppose that you wanted to check whether two lists of positive integers (list_1 and list_2) have the same maximum value. You could do this:

list_1_max = 0
for i in list_1:
    if i > list_1_max:
        list_1_max = i

list_2_max = 0
for i in list_2:
    if i > list_2_max:
        list_2_max = i
if list_1_max == list_2_max:
    print("The two lists have the same maximum value.")

However, you will notice that the structure of the code to find the maximum value of both lists is almost exactly the same (except for the variable names).

We can instead write this code much more simply using a function. You can define a function like this:

def list_max(int_list):
    max_value = 0
    for i in int_list:
        if i > max_value:
            max_value = i
    return max_value

The def keyword (short for “define”) says that the block of code that follows is what the function does. The name of the function is list_max, and int_list represents data that is given to the function when it runs. This line is usually called the declaration, whereas the indented lines of the function are called the body.

You can then use the function by calling it. You call a function by providing its name, followed by parentheses (()) surrounding the value to assign to its parameter(s). Here is an example of calling list_max:

>>> list_max([3, 1, 4, 1, 5, 9])

When you run list_max([3, 1, 4, 1, 5, 9]), what is actually happening? Effectively, the function first assigns the list [3, 1, 4, 1, 5, 9] to the name int_list, and then runs the code in the body of the function. The return keyword in the body of list_max means that max_value is the value that results from running list_max.

The expression that comes after return (in this case, max_value) is called the return value, because if you tell Python to run the function, it will compute that expression and return it to you. The list [3, 1, 4, 1, 5, 9] is called an argument to list_max - while a parameter refers to the name of the input, an argument refers to the input’s actual value when the function runs.

It is worth noting that the return keyword will immediately stop running the function after that line, no matter what else it might have left to do. Consider this function:

def find_item(item_list, target):
    for index in range(len(item_list)):
        if item_list[index] == target:
            return index
            print(f"It's not at index {index}...")

Let’s run this function with a list and an integer:

>>> find_item(["foo", "bar", "baz"], "bar")
It's not at index 0...

Notice that the function did not print anything for index 2 (where "baz" is), since it returned when it hit index 1.

Return Types #

Every function has a return type, which is simply the type of the value that the function returns. For example, the return type of the list_max and be_positive functions above is an integer because both functions return variables that are integers. Just as it is important to be able to work out what type a variable is, it is important to be able to work out what type a function returns.

As an example, if x and y are integers, what is the return type of the function below?

def midpoint(x, y):
    return (x + y) / 2

The return type of midpoint is a float, because x + y is an integer, and dividing two integers with / always returns a float.

Some functions don’t return anything at all. For example, consider this function, which just appends a few items onto the end of a list and has no return statement:

def last_laugh(word_list):
    for _ in range(3):

If you recall, appending something to a list produces no output:

>>> word_list = ["Hi", "there"]
>>> word_list.append("ha")

There is a special type that represents “nothing”, and it is called None (or the none type). The append function for lists, as well as most functions that modify an existing variable (adding to or deleting from a set or dictionary, for example), returns None. Perhaps most surprisingly, the commonly-used print function returns None.

One more time, for emphasis: the print function’s return type is None.

This is because the output you see from the print function is the string that it is printing to the screen, not the return value of the function. Since they look nearly identical, it can be confusing. A good way to remember this is to look at the output of a string versus the output of print:

>>> "Hello world!"
'Hello world!'
>>> print("Hello world!")
Hello world!

The quotes around the first 'Hello world!' indicates that this is a string being returned, and the message shown when running print does not have this.

You can check the return type of a variable, expression, or piece of data by using the type function. This simply returns the type of whatever you give it:

>>> type("Hello world!")
<class 'str'>
>>> type(2 + 2 == 5)
<class 'bool'>
>>> type(print("Hello world!"))
Hello world!
<class 'NoneType'>

In general, we recommend that you avoid writing functions that both return a value and modify an existing value. This is because it is more difficult to reason about what such a function does in a larger program. For example, the following program takes a list L and an item x, adding x to the end of L and returning the new length of L.

def append_and_length(L, x):
    return len(L)

If you call this function repeatedly (for example, in a loop), then you have to keep track of its return value as well as the fact that it is adding something to a list each time it runs.

Scope #

Here is a rather pointless function - it takes a single parameter, and sets its value to 42, doing nothing else:

def set_to_42(x):
    x = 42

The question is, what is the value of x after this code runs?

x = 0

You might be tempted to say 42, but it turns out that this is not the case:

print(x)  # This will print 0

Why is x not set to 42 despite calling this function? The answer has to do with a concept called scope.

When a function runs, any variables defined in its body are only valid within that function. So the statement x = 42 within the set_to_42 sets the value of x to 42, but only until the function finishes executing. Here is another example:

def define_y():
    y = 42

print(y)  # This will result in an error

Because y = 42 is only valid within the scope of define_y, trying to access the value of y outside of the function results in an error (unless we have defined y outside of the function as well).

Note that this only affects the definition of entire variables, so you can redefine part of a variable and see that change outside of the function’s scope:

def change_middle(int_list):
    int_list[1] = 5
sample_list = [1, 2, 3]
print(sample_list)  # This prints [1, 5, 3]

Also operations like appending will make changes that last outside of the function scope as well:

def append_42(int_list):
numbers = [1, 2, 3]
print(numbers)  # This prints [1, 2, 3, 42]

Docstrings #

Though we have left them out in previous functions for simplicity, all but the simplest functions you write should have a docstring. As its name might suggest, a docstring is a special string used to document a function’s use and purpose.

Before we get into the details of what information a docstring contains, here are a few principles to keep in mind:

  • A docstring should explain what the function does. Readers of your docstring need to know what your function does so that they can use the function in their code. A docstring saves the reader the effort of figuring out what the function does by reading the code.
  • A docstring should explain what the function’s inputs and outputs are. If a docstring describes what the function does and what its inputs and outputs are, the reader should have enough information to properly call the function in a program.
  • A docstring should mention any assumptions that the function makes. For example, some functions that take an integer as input require the integer to be positive. If the function will crash or return an incorrect result if these assumptions are violated, the docstring should make that clear.
  • A docstring should not explain how the function works. For example, if a function loops through every character of a string, it does not matter whether it does so using a for loop or a while loop.

With that being said, we can dive into what a docstring looks like. A docstring starts and ends with a set of three quotation marks ("), like this:

def compare_lists(left_list, right_list, which_items):
    Compare two sorted lists and return items belonging to one or both of them.
    Given two sorted lists of comparable items (i.e., every pair of items can
    be compared), called left_list and right_list, create three sets: one
    consisting of the items that only appear in left_list, one consisting of
    the items that only appear in right_list, and one consisting of items that
    appear in both lists. Duplicate items in lists are allowed. The value of
    which_items, which can be -1, 0, or 1, determines whether to return the
    list of items from only left_list, only right_list, or both lists,
        left_list: A list of comparable items.
        right_list: A list of comparable items.
        which_items: An integer equal to -1, 0, or 1, indicating which items
          to return.
        A list of items from only left_list, only right_list, or both lists.

The starting quote marks must be aligned with the function body - you will get a syntax error if they are not. It is good style to also align the ending quote marks with the function body, but you will not get an error if they are not aligned.

The first sentence of a docstring should summarize what the function does. It should be written in imperative style (i.e., “Compare two” rather than “Compares two” above).

Below, you can write a longer description if necessary. Here, you can explain the function in more detail, including any assumptions made or any special cases. This part is optional if the one-sentence description sufficiently explains how to use the function. If your function makes changes to variables (such as appending to a list) that is not explained in the one-sentence summary above, you should explain the changes here.

If your function takes any parameters, you should explain each of them in a section marked Args. For each parameter, explain the type of the parameter and what it represents. Similarly, if your function returns something, you should include this in a section marked Returns with an explanation of the function’s return type and what the return value represents.

As with other lines of code, no part of a docstring should exceed 80 characters in length, including the indentation.

Finally, note that the body of the function is simply pass. The pass keyword does nothing, but it can be written as a placeholder for code in an indented code block, such as the body of an if statement or for loop. The pass keyword can be useful if you are writing the overall structure of your code blocks but plan on writing the content of the block later.

Reasoning about Code #

A useful skill in developing and debugging code is learning how to reason about what a block or line of code is doing. This can include reasoning at a high level (what the code as a whole is accomplishing) or at a low level (what a specific line of code is doing with a variable). Below, we will describe techniques you can use to help you think about code at both levels.

Expected Behavior #

As you write code, it can be helpful to clarify what you expect the code to do, even before you write it. Usually, this involves thinking of a few simple cases and trying to predict what the result will be.

For example, suppose that you are writing a function called max_int(numbers) to return the largest integer in a list. If you call this function with a list containing a single integer (like [1]), you would expect the function to return that single integer.

This seems obvious - in general, you would expect to get the largest integer from a function designed to do so. But to thoroughly make your own expectations of the function clear, you should think adversarially, that is, like someone who is trying to make the function behave incorrectly. This can help you identify places in the code where potential problems might occur.

For example, what happens if you give max_int a list of all negative integers? What happens if all of the integers are the same? Perhaps most interestingly, what should max_int return if you give it an empty list?

In the next reading, we will talk about testing code to ensure that it behaves in the way you expect it to. But we recommend that you practice thinking through what you expect code to do, building that skill before diving into testing.

Stepping through Code #

You can also think about what individual lines of code do. Consider the following function:

def list_max(int_list):
    max_value = 0
    for i in int_list:
        if i > max_value:
            max_value = i
    return max_value

Suppose you call list_max([1, 3, 2]). Let’s think through what this function does at a line-by-line level.

  • First max_value is set to 0.
  • We are looping through the list, so the first value of i we consider is 1. Right now, i is 1 and max_value is 0.
  • We compare this to max_value, and since 1 is greater than 0, we set max_value to 1. Now, both i and max_value are 1.
  • In our next iteration through the loop, i is 3. max_value is still 1.
  • Since 3 is greater than 1, we set max_value to 3. Now, both i and max_value are 3.
  • In our next iteration through the loop, i is 2. max_value is 3.
  • Since 2 is not greater than 3, we do not change the value of max_value. So i is still 2 and max_value is still 3.
  • Finally, we have gone through the entire list. We can return the value of max_value, which is 3.

Keeping track of the values of variables in this way can be extremely helpful in debugging code that is not behaving correctly. Unfortunately, it can also be a bit tedious to keep this information entirely in your head.

One way of keeping track of this is through careful use of the print function. We could write the function like this:

def list_max(int_list):
    max_value = 0
    for i in int_list:
        print(f"Now considering {i} (current max: {max_value})")
        if i > max_value:
            print(f"Changing max_value to {i}")
            max_value = i
    return max_value

Notice the calls to the print function that make allow us to see information about the function as it runs. If we call list_max([1, 3, 2]), we get the following output:

Now considering 1 (current max: 0)
Changing max_value to 1
Now considering 3 (current max: 1)
Changing max_value to 3
Now considering 2 (current max: 3)

This is easy to follow, and we do not have to keep track of the values manually. Debugging code in this way is called print-based debugging, and is often used by software developers as a first step in diagnosing buggy code.

You can also step through code online, using a tool called Python Tutor. This allows you to enter code and go through its execution line by line, tracking every variable name (including the names of functions) and value. For the program above, the tool shows something like this:

Screenshot of Python Tutor in action

We highly recommend using print-based debugging or Python Tutor as you think through what your code is doing.