blog icon indicating copy to clipboard operation
blog copied to clipboard

Breadth-First Search (Python)

Open qingquan-li opened this issue 2 years ago • 0 comments

References:

  • Book: Grokking Algorithms
  • https://github.com/egonschiele/grokking_algorithms
  • https://pythontutor.com

How many steps are there to find the shortest route to a place? This type of problem is called a shortest-path problem.

The algorithm to solve a shortest-path problem is called breadth-irst search.

You can use breadth-irst search to:

  • Write a checkers AI that calculates the fewest moves to victory
  • Fine the smallest number of moves to checkmate in a game of chess
  • Write a spell checker (fewest edits from your misspelling to a real word—for example, READED -> READER is one edit)
  • Find the doctor closest to you in your network

1. Shortest route between two places

To figure out how to get from a place (Ex: Twin Peaks) to another place (Ex: the Golden Gate Bridge), there are two steps:

  1. Model the problem as a graph.

    Graphs are a way to model how diferent things are connected to one another (they don’t involve an X or Y axis).

  2. Solve the problem using breadth-irst search.

    Breadth-first search is a kind of search algorithm: one that runs on graphs. Breadth-first search can help answer two types of questions:

    • Question type 1: Is there a path from node A to node B?
    • Question type 2: What is the shortest path from node A to node B?
Twin Peaks to the Golden Gate Bridge

The shortest route from Twin Peaks to the Golden Gate Bridge:

The shortest route from Twin Peaks to the Golden Gate Bridge

2. Graph

2.1 What is a graph?

Graphs are a way to model how diferent things are connected to one another (they don’t involve an X or Y axis).

A graph models a set of connections. For example, suppose you and your friends are playing poker, and you want to model who owes whom money. Here’s how you could say, “Alex owes Rama money.”

Alex owes Rama money

Full graph of people who owe other people poker money:

Graph of people who owe other people poker money

Graphs are made up of nodes and edges.

nodes and edges

A node can be directly connected to many other nodes. Those nodes are called its neighbors. In this graph, Rama is Alex’s neighbor. Adit isn’t Alex’s neighbor, because they aren’t directly connected. But Adit is Rama’s and Tom’s neighbor.


2.2 Implementing the graph

Suppose you’re the proud owner of a mango farm. You’re looking for a mango seller who can sell your mangoes. You not only search your friends, but you search their friends, too. Remember, the goal is to ind one mango seller in your network. So if one of your friend isn’t a mango seller, you add his/her friends to the list, too. That means you’ll eventually search his/her friends—and then their friends, and so on. With this algorithm, you’ll search your entire network until you come across a mango seller. This algorithm is breadth-irst search.

network with degree
graph = {}
graph["You"] = ["Alice", "Bob", "Claire"]
graph["Bob"] = ["Anuj", "Peggy"]
graph["Alice"] = ["Peggy"]
graph["Claire"] = ["Thom", "Jonny"]
graph["Anuj"] = []
graph["Peggy"] = []
graph["Thom"] = []
graph["Jonny"] = []

2.3 Directed Graph and Undirected Graph

Anuj, Peggy, Thom, and Jonny don’t have any neighbors. They have arrows pointing to them, but no arrows from them to someone else. This is called a directed graph—the relationship is only one way. So Anuj is Bob’s neighbor, but Bob isn’t Anuj’s neighbor. An undirected graph doesn’t have any arrows, and both nodes are each other’s neighbors. For example, both of these graphs are equal.

Directed Graph and Undirected Graph
  • A directed graph has arrows, and the relationship follows the direction of the arrow (Rama -> Adit means “Rama owes Adit money”).
  • Undirected graphs don’t have arrows, and the relationship goes both ways (Ross - Rachel means “Ross dated Rachel and Rachel dated Ross”).

2.4 Degree

Another way to see this is, irst-degree connections are added to the search list before second-degree connections:

Degree

What happens if you search Anuj before Claire, and they’re both mango sellers? Well, Anuj is a second-degree contact, and Claire is a first-degree contact. You end up with a mango seller who isn’t the closest to you in your network. So you need to search people in the order that they’re added. There’s a data structure for this: it’s called a queue.


3. Queues

A queue works exactly like it does in real life. Suppose you and your friend are queueing up at the bus stop. If you’re before him in the queue, you get on the bus first. A queue works the same way. Queues are similar to stacks. You can’t access random elements in the queue. Instead, there are two only operations, enqueue and dequeue.

  • Enqueue: add an item to the queue
  • Dequeue: tak an item off the queue
enqueue and dequeue

The queue is called a FIFO data structure: First In, First Out. In contrast, a stack is a LIFO data structure: Last In, First Out.

queue vs stack

4. Implementing the algorithm

4.1 How will the implementation work?

How will the implementation work

push is almost always the same thing as enqueue, and pop is almost always the same thing as dequeue.

Make a queue to start:

# Use the Python double-ended queue (deque) function
from collections import deque

# Creates a new queue
search_queue = deque()
# Adds all of your neighbors to the search queue
# graph[“You”] will give you a list of all your neighbors, like [“Alice”, “Bob”, “Claire”]
search_queue += graph["You"]

Loop to find the mango seller:

# While the queue isn’t empty
while search_queue:
    # Grabs the first person off the queue
    person = search_queue.popleft()
    # Checks whether the person is a mango seller
    if person_is_seller(person):
        # Yes, they’re a mango seller.
        print(person + " is a mango seller!")
        return True
    else:
        # No, they aren’t. Add all of this
        search_queue += graph[person]
# If you reached here, no one in the queue was a mango seller.
return False

Create a function to tell you when someone is a mango seller:

def person_is_seller(name):
    """ Tell you when someone is a mango seller.
    This function checks whether the person’s name ends with
    the letter m. If it does, they’re a mango seller.
    Kind of a silly way to do it, but it’ll do for this example. """
    return name[-1] == 'm'

The breadth-first search in action: breadth-first search in action

And so on. The algorithm will keep going until either

  • A mango seller is found, or
  • The queue becomes empty, in which case there is no mango seller.

4.2 Avoid ininite loop

Alice and Bob share a friend: Peggy. So Peggy will be added to the queue twice.

If you check her twice, you’re doing unnecessary, extra work. So once you search a person, you should mark that person as searched and not search them again.

If you don’t do this, you could also end up in an ininite loop. Suppose the mango seller graph looked like this:

You and Peggy

Before checking a person, it’s important to make sure they haven’t been checked already. To do that, you’ll keep a list of people you’ve already checked.

def search(name):
    search_queue = deque()
    search_queue += graph[name]
    # This array (list) is how you keep track of which people you’ve searched before.
    # searched = []
    # Sets are significantly faster than lists if you want to check if an item is contained within it. 
    searched = set()
    while search_queue:
        person = search_queue.popleft()
        # Only search this person if you haven’t already searched them.
        if person not in searched:
            if person_is_seller(person):
                print(person + " is a mango seller!")
                return True
            else:
                search_queue += graph[person]
                # Marks this person as searched
                # searched.append(person)  # list
                searched.add(person)  # set
	return False

rearch("you")

4.3 All the code

# Use the Python double-ended queue (deque) function
from collections import deque

graph = {}
graph["You"] = ["Alice", "Bob", "Claire"]
graph["Bob"] = ["Anuj", "Peggy"]
graph["Alice"] = ["Peggy"]
graph["Claire"] = ["Thom", "Jonny"]
graph["Anuj"] = []
graph["Peggy"] = []
graph["Thom"] = []
graph["Jonny"] = []

def person_is_seller(name):
    """ Tell you when someone is a mango seller.
    This function checks whether the person’s name ends with
    the letter m. If it does, they’re a mango seller.
    Kind of a silly way to do it, but it’ll do for this example. """
    return name[-1] == 'm'

def search(name):
    # Creates a new queue
    search_queue = deque()
    # Adds all of a person's neighbors to the search queue
    # Ex: graph[“You”] will give you a list of all your neighbors, like [“Alice”, “Bob”, “Claire”]
    search_queue += graph[name]
    # This set is how you keep track of which people you’ve searched before.
    searched = set()
    # While the queue isn’t empty
    while search_queue:
        # Grabs the first person off the queue
        person = search_queue.popleft()
        # Only search this person if you haven’t already searched them.
        if person not in searched:
            # Checks whether the person is a mango seller
            if person_is_seller(person):
                # Yes, they’re a mango seller.
                print(person + " is a mango seller!")
                return True
            else:
                # No, they aren’t. Add all of this
                search_queue += graph[person]
                # Marks this person as searched
                searched.add(person)
    # If you reached here, no one in the queue was a mango seller.
    return False

search("You")

5. Running time

If you search your entire network for a mango seller, that means you’ll follow each edge (remember, an edge is the arrow or connection from one person to another). So the running time is at least O (number of edges).

You also keep a queue of every person to search. Adding one person to the queue takes constant time: O(1). Doing this for every person will take O (number of people) total. Breadth-irst search takes O (number of people + number of edges), and it’s more commonly written as O(V+E) (V for number of vertices, E for number of edges).

qingquan-li avatar May 15 '22 02:05 qingquan-li