Reading 0: Markdown, Jupyter, Bash, and Git #
This is the reading for Assignment 0: Introduction to Assignments. You should read it before completing Assignment 0.
Markdown is a simple language for structuring text that is designed to be easy to read and write. Technically, Markdown is a programming language with a very specific application (structuring text), and so is written as source code (plaintext) that is then displayed as text with special formatting for structural elements such as section headers, links, images, and code.
The picture below shows an example of Markdown source code (left) and processed text (right) in a note-taking application (Joplin).
It is important to note that Markdown is primarily aimed at structuring text rather than formatting text. On its own, Markdown does not provide a way to do things like change the font or font size, resize images, or use different colors for links. Part of the reason for this is that Markdown was originally designed to be converted to Hypertext Markup Language (HTML) and used on the Web, where it is typical to use Cascading Style Sheets (CSS) to format text.
Files that are purely Markdown usually end in
.md, but the language is used in
a wide variety of applications. Jupyter notebooks (of which Assignment 0 is one)
uses Markdown to format non-code portions, and so the assignment is written
almost entirely in Markdown. If you are viewing the assignment in the Jupyter
notebook viewer, you can probably see the Markdown source by double-clicking a
section of text. To switch from source back to formatted text, press
As your first foray into learning Markdown syntax, you should read GitHub’s Mastering Markdown page. For more details and examples, you can read GitHub’s Basic writing and formatting syntax page. The latter page is more similar in format to software documentation, and it may help you to get a feel for this format, as you will be reading documentation later in this course.
Additional Tips #
Below, we list a few things about Markdown that are not sufficiently explained (or not explained at all) in the guides above.
In this course and in the linked pages, headings are formatted using the ATX
style - that is, using one to six
# characters to denote a heading level.
There is also the setext style which formats headings like this:
Heading level 1 (equivalent to `# Heading level 1`) =================================================== Heading level 2 (equivalent to `## Heading level 2`) ----------------------------------------------------
However, this style only supports the above two heading levels - further levels are not supported.
In general, you should think of these headings as providing structure rather than formatting. If you want a smaller font size for your header, you should not just use a higher heading level - there are other ways of controlling the font size of headings. For example, if you are currently writing under a level 3 heading (as this text is) and want to include a subheading, you should use a level 4 heading, and if you find that the font size for level 4 headings is not to your liking, you can adjust that elsewhere.
Some Markdown style guides will encourage you to only use the level 1 heading once, in the title of the document. While this is not a strict rule, it does make it easier to translate Markdown to other formats, so we encourage you to try and follow this rule where possible.
Line Breaks #
The smallest unit of text that Markdown groups together is the paragraph. Thus even if you start a sentence on a new line, the formatted text will come out as part of the same paragraph:
The smallest unit of text that Markdown groups together is the paragraph. Thus even if you start a sentence on a new line, the formatted text will come out as part of the same paragraph:
To start a new paragraph, enter one or more blank lines after the previous paragraph. It does not matter how many blank lines you use, but you should only use one.
Occasionally, you may need to break a line at a certain place. For example, if you are writing a haiku, you need to make sure that each line has the appropriate number of syllables. You can make sure to start a new line by ending a line with two spaces:
Three things are certain: Death, taxes, and lost data. Guess which has occurred.
Which yields the following:
Three things are certain:
Death, taxes, and lost data.
Guess which has occurred.
In the Optional Exercise below, you have the chance to practice this and other types of formatting.
Alt Text #
Because Markdown was originally intended as a convenient shorthand for HTML, the language has some features that were influenced by HTML. Alt text (sometimes also called alternative text) is one of those features. In Markdown, this refers to the text that is shown if an image cannot be displayed for some reason (for example, if the URL pointing to the image is broken). On the Web, you can also see the alt text for images by hovering your mouse cursor over them.
If the image displays successfully, the alt text is not shown at all, but you should still write alt text for every image you include. Beyond being a failsafe for broken images, alt text is heavily used in Web accessibility, such as in screen readers used by visually impaired people.
Unfortunately, a great deal of sites on the Web do not include sufficiently helpful alt text for their visitors, creating a poor experience for users who rely on this feature. As new programmers, you should make a habit of including descriptive alt text with images in Markdown.
If you are interested in learning more about how to write good alt text, the nonprofit WebAIM (Web Accessibility in Mind) has put together an excellent guide for doing so. (The guide is focused on HTML, but the principles are largely the same.)
Optional Exercise: Markdown Tutorial #
This interactive Markdown tutorial guides you through a few more Markdown concepts not mentioned above and has some helpful practice exercises. While these additional concepts are useful in the right context, we will probably not use them for the assignment.
Jupyter is an interactive computing platform for notebooks, structured documents that can display and run both formatted text and code. Jupyter is often used for Python and rather popular in the data science community, although its use is not limited to either of these.
A notebook consists of a series of cells, which are (usually short) blocks of text or code that can be run. Each cell can be a Markdown cell, code cell, or raw cell, but in this course we will only use Markdown and code cells. As its name suggests, a Markdown cell contains Markdown source that, when run, displays the formatted text corresponding to the source. A code cell contains code that can be run to display its output. All of a notebook’s code cells are written in a single language (in our case, Python). A code cell looks like this:
Because each cell’s code can be run independently of other code cells, the number to the left of the cell shows the order in the code cells were run.
Assignment 0 is a Jupyter notebook containing only Markdown cells. As we
mentioned above, double-clicking a Markdown cell shows the source for the
formatted text and allows you to edit the source. Pressing
the source to display formatted text.
Jupyter Notebook (note the capitalization) is a web application used to run and edit Jupyter notebooks, and likely the application you are using for the assignment. JupyterLab is a newer application for working with Jupyter notebooks that works in mostly the same way, but with a significantly redesigned interface. In Jupyter Notebook, a selected cell will appear with a blue or green bar on the left. Blue means that the cell is selected and that you are in command mode, where you can do operations such as copying a cell or merging two cells. Green means that you are in edit mode, and you should see a text cursor somewhere in the cell. A cell in edit mode looks like this:
Note that in other notebook editors such as JupyterLab, the colors for these modes may be different. Since this is a Markdown cell, it does not have a number.
From command mode, you can press
Enter to enter edit mode for the selected
Esc to return to command mode, keeping the cell selected. In command
mode, you can use
Up/Down to select the cell above or below the current one.
You can also use
j/k, which some find easier because the keys sit just below
your right index and middle fingers on the home row of most standard QWERTY-like
keyboards. You can type
h from command mode to see a list of all possible
command shortcuts (and press
h again to dismiss the list). We encourage you to
take a look at the commands and try to make a habit of using them as you work
with Jupyter notebooks.
In the assignment exercises, you will have the chance to practice editing in Jupyter notebooks, which also gives you the opportunity to try out some of the keyboard shortcuts for operations like adding and editing cells.
The Command Line #
Most people interact with a computer through a graphical user interface (GUI), but many tasks in computing can be more efficiently done through a command-line interface (CLI). Learning to effectively use a CLI is an important skill in computing, particuarly if you plan on using computing beyond this course.
The CLI we will use in this course is that of Ubuntu Linux 20.04 running on the Windows Subsystem for Linux. It looks like this:
As you can see, the command line is a primarily text-based interface, having little to no support for displaying graphics, and, as its name suggests, users interact with a CLI by inputting lines of text that are interpreted as commands. In this section, we will walk you through the basics of using the command line that you will use to submit your completed Assignment 0.
Command Line Terminology #
In this course, we will be using the Bash shell for our command-line needs. A shell is a program that provides a way of interacting with your operating system’s services, such as managing files, running programs, and accessing networks. Bash is a commonly used shell that uses a command-line interface.
You may hear the terms command line, shell, and Bash, along with another term, the terminal, used interchangeably, even though they are slightly different. There are many programs that are shells, of which Bash is just one. The command line is the interface used to interact with Bash, and the terminal is the program that runs the window in which the shell runs. Specifically, starting a terminal opens a window and immediately starts the shell (Bash in this case) in that window. The program is more specifically a terminal emulator, because it emulates the older terminals that were once used to manage computer input and output.
Bash’s main feature is a read-evaluate-print loop (REPL), which repeatedly does the following steps in order:
- Read: as you type, the terminal emulator translates your keystrokes into
input for the shell, usually a sequence of alphanumeric characters. You will
see this text appear in the terminal window, and you can send your typed text
to be read by the shell by pressing
- Evaluate: the shell interprets the sequence of characters you sent as a command and runs the appropriate computation.
- Print: if the computation produced any output for the user, the shell displays this output.
If you read the above paragraphs and are still fuzzy on the exact differences among all of the terms, that’s okay. We have tried to be precise with our terminology, but these terms are often confused even by people who have worked in computing for many years. If you search for help on the Web, it may be useful to know these differences, but you will not need this knowledge to complete any of the exercises in, or to submit, Assignment 0
The Bash Prompt and Command Usage #
In the picture above, the line
(base) steve@DESKTOP-PSCV8TC:~$ is called the
prompt because it prompts the user for input (shown by the cursor to its
right). Most prompts also show some information about the shell and/or system,
as seen in the picture. When you type text into the terminal, the characters
will appear to the right of the prompt.
On many sites, you will see Bash commands written like this:
$ indicates that this is a command in Bash, and should not be included (in
other words, this is telling you to type
ls and press
Enter). Sometimes, you
may see this:
# indicates that you should run the command with administrative
privileges. This is sometimes called running a command “as root” or “as the
superuser”. Because commands run in this way can make catastrophic, irreversible
changes to your system, you should use caution when doing tasks this way. For
now, you should run commands in the normal way (in fact, we will not even
mention how to run a command as root for now).
A more complicated command might look like this:
$ unshare -r --fork --pid unshare -r --fork --pid --mount-proc bash
(Don’t worry about what this command actually does.) The first word in the
unshare) is the name of a program that the shell will run, and the
remaining words are called arguments to
unshare. The arguments that start
with one or more hyphens such as
--fork are called flags or
If you mistype a command, you will see a message like this:
$ unshre -r --fork --pid unshare -r --fork --pid --mount-proc bash Command 'unshre' not found, did you mean: command 'unshare' from deb util-linux (2.34-0.1ubuntu9) Try: sudo apt install <deb name>
With this in mind, we can move on to learn about a few basic commands that you will need for the assignment
As you read through the command descriptions and try them out for yourself, it
may help to know that you can use the Up/Down arrows or
access and switch between previously typed commands, and you can use
Ctrl-E to go to the beginning or end of the command you are currently
typing. We will mention a few other tips for more efficiently working in Bash as
we describe the commands.
Common Bash commands are often abbreviations for words that describe the task
being done. For example,
ls is short for “list”, and lists all the files in a
folder. By default,
ls shows all of the files in the current folder. On our
sample machine, this is what we get:
$ ls Anaconda3-2020.02-Linux-x86_64.sh anaconda3
On some machines,
anaconda3 is highlighted, indicating that it is a
directory, that is, a folder. By default, when you start a shell on most
operating systems, you start in your home directory specific to your user. On
Linux-like systems, the home directory is written as
~ (as you can see in the
prompt), but can also be written as
user with your
You can also use
ls to view the contents of other directories. For example,
ls anaconda3 would show the contents of the
ls with multiple arguments will perform
ls for each one, so if you
had two directories called
bar, you can run
ls foo bar to see the
contents of both directories at once.
ls anaconda3, you can hit
Tab in the middle of
you will see that Bash completes the name of the directory for you. Generally,
if you hit
Tab when typing an argument that is the name of a file or
directory, Bash will try to complete the name for you. If there are multiple
possibilities (for example, if you just typed
a and there were multiple
directories starting with
a), Bash will complete as much of it as it can, up
to the first differing character. If you hit
Tab additional times, Bash will
print out the remaining possibilities, allowing you to type further characters
You can also use
ls to view files and information about them. For example,
ls Anaconda3-2020.02-Linux-x86_64.sh will just print
Anaconda3-2020.02-Linux-x86_64.sh, which is not very useful, but if you type
ls -l Anaconda3-2020.02-Linux-x86_64.sh, you will see this:
$ ls -l Anaconda3-2020.02-Linux-x86_64.sh -rwxrwxrwx 1 user user 546910666 Mar 11 12:04 Anaconda3-2020.02-Linux-x86_64.sh
-l flag prints extra information about the file, including the size of the
file in bytes (546910666) and when the file was last modified (March 11, 2020 at
12:04 PM). The other information will probably not be useful in this course, but
it can be helpful to know that if the first part of the output starts with
drwxrwxrwx rather than
-rwxrwxrwx, it means that the name in
question refers to a directory.
cd command is short for “change directory” and is used to move to another
directory. Unless there is an error,
cd does not produce any output, but on
some systems you will still see a change when moving into a new directory:
(base) user@DESKTOP-PSCV8TC:~ $ cd anaconda3 (base) user@DESKTOP-PSCV8TC:~/anaconda3 $
To move to a directory like
~/anaconda3, you do not have to be in the parent
anaconda3 (in this case, the parent directory is
You can move directly to a new directory, like
Each directory contains two special directories that are not shown by default:
. refers to the current directory, and
.. refers to the parent of the
current directory. So if you are in
~/anaconda3 and want to move to
cd .. rather than
cd on its own will change to
~ by default. If your username
user, the directory
~ is short for
/Users/user on macOS).
mv command is short for “move” and is used to move or rename a file or
directory. For example, if you want to change the name of a file, you can do
something like this:
$ mv Anaconda3-2020.02-Linux-x86_64.sh anaconda-installer.sh $ ls anaconda-installer.sh anaconda3
mv old-name new-name will rename the file or directory
new-name. But if a file or directory called
new-name already exists, things
get a little complicated:
new-nameare both files, then
new-namewill be overwritten with
new-namewill be deleted and
old-namewill be renamed to
old-nameis a directory and
new-nameis a file, then
mvwill fail with an error.
new-nameis a directory, and regardless of whether
old-nameis a file or directory, it will be moved into
You should be very careful when using
mv, because there is no undo operation
for most Bash commands. If you overwrite a file using
mv, you will probably
not be able to recover it. (However, if you rename a file and want to undo the
change, you can just rename it back with
Finally, you can pass many arguments to
mv if the last argument is a
directory. In this case, all other files will be moved into the directory. So
for example, if you have files
baz, and you want to move all
of them into the folder
qux, you can run
mv foo bar baz qux.
cp command is short for “copy” and is similar to
mv except that it does
not remove the old file. Because
cp makes a copy of a file, it takes longer to
run for larger files. For example, running
cp on the renamed
anaconda-installer.sh file from earlier can take a few seconds, whereas the
same command with
mv instead of
cp is nearly instantaneous.
There are a few differences between
mv. Most notably, copying a
directory on its own will not work:
$ cp anaconda3 anaconda4 cp: -r not specified; omitting directory 'anaconda3'
As the error message suggests, the
-r flag is necessary when copying
directories. The correct syntax would thus be
cp -r anaconda3 anaconda4,
though we recommend you don’t run that exact command, as the
directory is several gigabytes in size.
rm command is short for “remove”. As its name suggests,
rm removes files
or directories, and should be used with caution. If used on a file,
permanently remove that file without any confirmation:
$ ls anaconda-installer.sh anaconda3 foo $ rm foo $ ls anaconda-installer.sh anaconda3
rm is used on a directory, it will not be removed by default:
$ ls anaconda-installer.sh anaconda3 bar $ rm bar rm: cannot remove 'bar': Is a directory
To remove a directory
rm -r bar instead. Make sure to exercise
caution, as this will irretrievably remove all files in
bar, as well as
Pattern Matching #
Pattern matching in Bash can be used to make your commands more succinct by specifying a pattern of files to operate on rather than each file individually. In this reading, we will examine a simple kind of pattern matching.
For any command in Bash, you can use
* to perform a wildcard match. For
example, if you have a directory containing files
bar, then the
cp * .. will copy both
bar to the parent directory. The
wildcard can be used in conjunction with other text, so if a directory contains
mv b* foo will copy both
Wildcards only match with the contents of a single directory, so
refer the special directories
.., or to files in parent or child
directories. To refer to all files in a child directory
foo, you can use
Note that when you run a command with
*, Bash expands the wildcard match into
its values before doing anything with the command. Thus in a directory
containing only a directory
cd * will turn into
cd foo and
foo. (If the directory contains multiple sub-directories, the
command will likely result in an error instead.)
Note: parts of this section have been adapted from Chapter 1 of Pro Git. This section mainly covers material from Sections 1.1-1.3 and 2.1-2.2 of the book.
Software design is a collaborative, iterative process. As work on a software project goes on, many people can be working with the same code, updating it with different features, documentation, or bug fixes. Sometimes, a change to code can introduce bugs that cause software to run in undesired ways.
Without the proper safeguards, software design can turn into a chaotic and confusing process. People may work on bugs in a now-obsolete version of code, not realizing that someone else has already fixed the bug. A developer may accidentally release a flawed version of the code, not realizing that someone else’s latest changes have introduced a new bug. Design discussions may be difficult because everyone has different versions of what they think is the “latest” version of the code.
To minimize the risk of these hurdles in the software design process, we will learn and use Git in this course. Git is a version control system, a type of software designed to keep track of changes to a set of files over time. Git has been designed to be fast, easy to use for large-scale collaboration, and full-featured enough to handle a variety of workflows and problems. Here, we will present the basic model of how Git works, along with the commands that you can use for a very basic workflow: making and publicizing changes to a directory of files.
Repository Overview #
Git operates on a repository of files, which consists of a directory of files and other directories. Changes to the files within a repository are tracked as a series of commits, which essentially represent a snapshot of the repository’s files and directories at a given point in time. A Git commit is a version of the code at some point in time, so each version represents a snapshot of a repository, like this:
Within a repository folder, Git will only track the files that you tell it to and only incorporate changes that you specify should be included in a commit. The process of marking modifications to a file to be included in the next commit is called staging. Files that Git watches for changes are called tracked, while files in a repository but not monitored by Git for changes are called untracked. Below, you can see how these types of files and folders relate to each other:
In the sections that follow, we will describe a basic workflow for working with Git repositories.
Cloning a Repository #
In most cases, you begin by cloning a repository (usually from a different machine) to your machine. This creates a copy of the repository on your machine, complete with its version history and previous versions of files. You can then make changes to your copy of the repository as we describe in the next sections.
To clone a repository, you can use the command
git clone https://github.com/user/repo.git, where
replaced with the URL of the repository you want to copy (called the remote
repository). Note that the repository URL ends with
.git. Running this command
will produce output that looks like this:
$ git clone https://github.com/user/repo.git Cloning into 'repo'... remote: Enumerating objects: 275, done. remote: Total 275 (delta 0), reused 0 (delta 0), pack-reused 275 Receiving objects: 100% (275/275), 45.77 KiB | 278.00 KiB/s, done. Resolving deltas: 100% (118/118), done.
The repository files will be placed into a folder called
repo in the directory
from which you ran the
git clone command. If you want your local folder to be
called something else such as
myrepo, you can instead run a command like this:
$ git clone https://github.com/user/repo.git myrepo
We generally recommend that you clone repositories with their default names (i.e., with just the URL) unless there is a good reason not to.
Checking the Repository Status #
git status can be used to track the state of the repository. The
result of running this command in a repository looks something like this:
$ git status On branch master Your branch is up-to-date with 'origin/master'. Untracked files: (use "git add <file>..." to include in what will be committed) README.md bar.py foo.py nothing added to commit but untracked files present (use "git add" to track)
This status message tells us a few important things:
On branch mastertells us that we are in a branch of the repository called
master(the default branch name). We will cover branching later in the course.
Your branch is up-to-date with 'origin/master'.tells us that this repository has a remote copy (stored at a different location, which may be on a different machine), and that our history up to the most recent commit matches that of the remote copy.
Untracked filestells us what files are in this repository that are not being tracked by Git. Any new files created in, or copied/moved into, the repository directory will be listed as untracked files until staged.
nothing added to commit but untracked files presentsummarizes the situation: while there are files in the repository that are not being tracked, nothing has been staged yet.
git status is one of the most useful ways for tracking the state
of a repository and for diagnosing problems in the current version of the
repository. If you run into any problems while using Git, we strongly recommend
starting with the output of
git status to determine what may have gone wrong.
Adding and Removing Files #
As Git helpfully tells us in the status message above, we can stage a file or
set of files by using
git add, followed by the name of all files to stage:
$ git add README.md $ git status On branch master Your branch is up-to-date with 'origin/master'. Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: README.md Untracked files: (use "git add <file>..." to include in what will be committed) bar.py foo.py
git add does not produce any output - you need to run
to see that the file has been staged. Also, you cannot stage an empty directory
- it will not even show up as an untracked file.
You can remove a file using
git rm, which will stop tracking a file and
(irretrievably) remove it from the directory. If you want to stop tracking a
README.md but do not want to delete it from your machine, you can
git rm --cached README.md instead. The file will then show up as an
untracked file in your repository.
Committing Your Changes #
Once you have staged all of the changes you want to include, you can use the
git commmit to package your changes into a new version. If you have
configured Git according to the course computational
setup, then this command
will launch VS Code with a file that contains something like the following:
# Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # On branch master # Your branch is up-to-date with 'origin/master'. # # Changes to be committed: # new file: README.md # # Untracked files: # bar.py # foo.py #
This is a commit message, which describes the changes that make up this commit. In theory, the commit message can be anything you like as long as it is not empty. In practice, however, you should make your commit message descriptive of the change you made and why.
Later in the course, we will take a more in-depth look at commit message style. For now, we recommend that you stick to the following guidelines:
- The first line of your commit message should be at most 50 characters long.
- The first line is mandatory, and should describe the change you make as an
add first draft of READMErather than
this commit adds...).
- Your commit message should describe your changes in reasonable detail. For
add first draft of READMEis fine, while
add filesis not.
- All other lines of your commit message (other than the lines that your start with) should be at most 72 characters long.
Once you are done writing your commit message, you can save and close the file and editor. This should commit the changes, outputting a message that looks something like this:
[master 83e38c7] Add new benchmarks 1 file changed, 5 insertions(+), 0 deletions(-)
Pushing Your Changes #
At this point, you have created a new commit to your repository. However, if
your repository has a remote copy such as
origin/master, then this copy does
not yet have your new commit - the changes exist only in your local copy. Git
has no concept of a “master copy”, so you will need to push your changes to
the remote copy.
Pushing to a remote repository essentially copies the newest parts of your
version history to that of the remote. If running
git status mentions
origin/master, then your repository likely already has a remote
repository configured (and if you clone a repository from a URL, this is often
the case). The remote copy in this case can be referred to as
origin/master is the branch). You can push your commits to this copy by
git push origin, which when successful produces output like this:
Enumerating objects: 19, done. Counting objects: 100% (19/19), done. Delta compression using up to 12 threads Compressing objects: 100% (10/10), done. Writing objects: 100% (10/10), 984 bytes | 492.00 KiB/s, done. Total 10 (delta 9), reused 0 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (9/9), completed with 9 local objects. To github.com:user/repo.git 507c2a3..f81bcb2 master -> master
In some cases, you may be asked to input a username and password when pushing your changes - this is particularly common if you cloned the repository from a URL. Since Git repositories do not have a master copy, the owner of the remote repository needs to allow you to make changes to it, and authenticating you with a username and password is one way to do this.
GitHub and Forking #
In this course, we use GitHub to host copies of Git repositories. You can use GitHub to make repositories publicly accessible, manage your work on repository code (on your own or with others), or track issues. One convenient feature GitHub offers is called forking, which is essentially cloning a repository to your GitHub account. The advantage of forking is that it allows you to make a copy of someone else’s repository in your account. You can then clone this copy to your machine, make changes, and push commits back to your copy without affecting the original repository on GitHub.
To fork a repository, visit the repository’s page on GitHub. Near the upper-left corner of the page, you should see a button that says “Fork”. Click on this button to copy the repository to one in your account.
One caveat of forking a repository is that since it is a copy of another
repository, you will need a way of copying changes made in the original
repository to your forked copy. To do this, you need to configure your copy to
treat the original repository as another remote, which is typically referred to
upstream. Because this is a relatively common operation on GitHub, the
GitHub documentation provides a
for doing so.
In this course, we will use forks of an upstream course repository to distribute assignments, so it is part of the preliminary Assignment 0. We encourage you to read the above guide to complete the relevant part of the assignment.