Graphs and Trees using C++ (STL)

Graphs and Trees using C++ (STL)

Recommended before attempting, these: STL Tutorial Problems

Level O: Representation

Tutorial (Hackerearth)

Let?s try to represent Un-directed Graphs, some commonly used methods are given for help:

Adjacency Matrix Representation using 2-D Array

Adjacency List Representation using STL Vector

Each representation has its own benefits:

Using the matrix representation we can tell that an edge exists between nodes u and v in O(1) time by just checking the value in adj_matrix[u][v]. It will be O(n) time using the adjacency list representation as we will be searching the entire list adj_list[u] for if it contains v or not.

But what if the graph has very few edges? The adjacency list representation will take lesser memory as it stores just the edges (in a sense). In easy words, v is pushed to list of u if and only if there is an edge between u and v. This isn?t the case with matrix representation as it will always take O(n*n) space as we always need an array of size at least n * n (here n is the number of nodes).

Similarly, we can represent directed and/or weighted graphs. Some more resources here and here.

Okay, so can we solve some problems now?Yes, but only a few. The representations have no relevance unless we learn to use them.

Having problem visualising graphs? Look-up to this

Bonus Level 0.1: Modelling Basic Problems as Graphs

What is the use of graphs? Why should we take some random nodes and edges as input and bother build an adjacency list or matrix at all!?

The above questions certainly make sense and therefore for some motivation to continue over journey, we must learn to think in the form of graphs.

Eg. 1: Friends or FoeLet?s consider the case of a few friends. Let?s assume that if John is friends with Kyle and Kyle is a friend of Vienn, then John is also a friend of Kyle. Reasonable assumption!

Now, we know all the immediate friendships in our locality. We can build out of these:

Science of Dependencies (the hidden struggle)

An edge from A to B denotes that B depends on A for its installation.

  1. How many packages are required to install Matplotlib?The answer is 3. How? We see the incoming edges to Matplotlib. :/
  2. But is that it? Consider the case of Keras? What all is required for its installation? Clearly we need Tensorflow, which again requires Six to be available. So, here the answer is 2.

How to approach this problem? Let?s reverse the edges and see where we can reach from given package to be installed. Where-ever we reach is a dependency and needs to be installed before the package itself.

DAG: Make yourself familiar with this acronym. It expands to Directed Acyclic Graph i.e. a directed graph which has no cycles like above case. A cycle in a graph is defined by a path which starts at a vertex, follows edges and reaches the same vertex. Since we can?t do this in our software dependency graph, we say that it has no cycles.

Exercise for Level 0 and Bonus Level 0.1: What more can our dependency graph do? Suggest a method to find the order in which the dependencies for a given package need to be installed. For example, in case of Keras, we will first install Six, then Tensorflow and finally Keras. (Reverse the edges and do what?). What should be the order of installation when we want to install every package in the graph? (spoiler)

Level 1: Graph Traversal (BFS and DFS)

It is clear that we need some ways to travel through the graph, and find out what connects to what. There are two ways to do it.

Let?s say you have a precious object which fell into a pool! Now you are searching for it in the pool. How will you do it?

  1. Depth First Search:You start with some point on the pool?s surface, go as deep as you can (to the floor of the pool) and come back. If you don?t find your object, you take some neighbouring point on the surface and repeat the process.
  2. Breadth First Search:You first search the entire surface of pool. Then go deep slighty and search entire area. Then a bit more deep and so on?

Before you start, recommended pre-requisites: stack and queue, how to use the C++ STL stack and queue (try skills here)

This is the rough difference between the two. Time to learn:DFS-Tutorial, BFS-Tutorial

The implementations are done using stack for DFS and queue for BFS. It is highly recommended that you don?t go further until you make yourself comfortable with why these specific data structures.

Exercise 1.1: Implement BFS and DFS using queue and stack respectively.Exercise 1.2: Given an unweighted and un-directed graph. How to find shortest distance of a node v from a node u? (Try it out here)Exercise 1.3: Let?s say the graph is a weighted one (only positive weights) with edge weights denoting distance between nodes they connect. Will the previous method still work? Why or why not? (spoiler for later)

Recursive implementation of DFSThe best and most useful implementation of DFS is by implementing it as a recursive function and not using stack! Stack and Recursion are similar in a sense. Consider recursion, whenever you enter a function, it is similar to pushing onto the stack something, and whenever you return, you are essentially popping from the top of stack!

What?s the time complexity? Given we have n nodes and m edges. We are going on each node once (thanks to visited array) and we also go on each of the edges once (consider the for loop in the function). So the complexity is O(n+m) for this implementation.

Try out your recursive DFS implementation here.

FAQ: What?s next? I seem to have learnt a lot?Indeed, you have come far! There?s a lot that you can do now. Only a handful of tools like the ones we just learnt about will enable you to tackle a lot of problems.

Quick points:

  1. The problems may not directly specify a graph. So, the first part is to get the graph out of a graph problem, and even before that figuring out the fact that the problem is a graph problem. 😛
  2. Once you have the graph on paper, it is not necessary that you build a representation for it. Remember the core stuff! You need to go to vertices, iterate through edges (order and way as per the algorithm you choose). So you can do it in many ways, instead of actually building a list or matrix.

More problems to try:[Somebody add problems]Rectangle Path

Level 2: Trees

What is a Tree? Tree is a Graph with a special property that it has N nodes and N-1 edges. We won?t be having any multiple edges or self loops, so a tree looks something like this:

Another Tree
  1. Here is an example of a tree, rooted at 0, with levels mentioned in brackets.
  2. We set the level of root as 0. For any other node, we set the level to be:level(i) = level(parent(i)) + 1
  3. Level just denotes the distance of a node from the root.
  4. Distance between a node and any of its ancestors can be found by subtracting the level values.

Implementation of tree DFS to find parents and level.

Diameter and Center of a TreeOne more definition which we should cover here is of the diameter which is nothing but the longest path in the tree. Center of a tree is the node which lies in the middle of the diameter. If the diameter happens to have an even number of nodes, both the middle nodes are called as centers.

Exercise 2.2: Implement algorithm for finding diameter (it requires DFS only). Here is the algorithm:Root the tree at an arbitrary node and run DFS to find a node u farthest from this root node. Now, root the tree at u and run DFS again to find the farthest node v from u. That?s it, the path from u to v is the required diameter. Why does this happen? (another exercise)

Exercise 2.3: We know the center, find out what?s meant by centroid of a tree.

Bonus Level 2.1: Modelling Problems as Trees

All of the intuition that works for graphs, works for trees too. But trees can do a little more.

  1. Trees can model much more abstract things that we may imagine right now. For example, a node in a tree can represent a problem and its child nodes may represent constituent sub-problems.
  2. Being able to effectively compute values at nodes (problem solution) using values in the child nodes (sub-problem solutions) gives rise to Dynamic Programming on Trees.
  3. Trees are the base of a huge number of data structures e.g. Segment Trees, Binary Search Trees, Fenwick Trees etc. All of which work by either assigning some property to child nodes at each node, or by computing property of current node using child nodes.
  4. Trees being a special case of graphs have many unique properties, which allow development of many fast algorithms to perform operations.

Party InvitationsLet?s assume some club (not the Programming Club!) has a huge hierarchical structure of organisation, where every member has one manager except for the President who is managed by none, of course. Whenever its the birthday of any member, he or she will just invite his/her subordinates. A subordinate of member A is defined as a member B who is either managed by A directly or is subordinate to some other member C who is managed directly by A.You being a reputed member of the Programming Club are asked by the President of the other club a particular type of queries. He/she will give you the name of member of his/her club and ask you how many people will be there in the birthday party of given member. Assume everyone who receives an invite will go the party and the one throwing the party will also be present.

We can easily solve this by manually counting each time. But, what if the queries are too many? First of all let?s model the entire organisation as a tree.

  1. Let the president be the root: 1.
  2. We add edges between a member and his/her manager.
  3. Clearly, parent node is the manager.
  4. Just to make the problem clear, if the query is 7, the answer is 5. Since 7 will invite its subordinates {4, 3, 8, 6} and would itself be present.
  5. This reduces to finding the size of sub-tree of a node.

The brute force complexity to find sub-tree size is O(N) where N is total nodes. This is by the use of DFS starting from given node downward towards children and count the number of nodes it visits. Therefore, if we have Q queries, our worst case complexity goes to O(N*Q). Even, if we memorise answers for each node, we will have O(N*min(N, Q)), not fast enough!

Let?s tweak the tree DFS!The idea is to compute all answers at once using a single DFS. Let?s say we are at a node u, and we know the number of nodes (read people) attending party for each of its child nodes, we can simply add them and add 1 to the final answer for the node u will itself be in the party.

Let?s jump to the code:

Make sure that you understand how and why this works.

Handling new joineesLet?s make the problem a bit more complex. What if more and more members keep joining the club? Every time a new member joins, he/she will be assigned a manager (i.e. will be added as a leaf node). A leaf node is a node with no children. Will we keep on rebuilding the tree and rerunning this DFS again and again?The key observation is that whenever a leaf node is added only its ancestors are affected. So we just need to add one to all nodes lying on its path to root.

It?s now O(height_of_tree) worst case. If there are no constraints on tree height, the updates will, in worst case, cost O(N). But what if the tree height has constraints and the club has some rule that the maximum hierarchy won?t exceed some value M (M < N), then we are done in O(M).

This is the essence of a lot of tree based data structures, where updates are made possible through operations on ancestors and the tree is designed in such a way that its height can be ensured to be of a low value (very-very low in comparison to number of nodes, log(N) a lot of times).

Final Word

The learning shouldn?t end here! This is not complete, I will try to continue with it some day later. This text was primarily written keeping in mind the needs of beginners at IIT Indore. We are open for any queries/suggestions through the forums! Happy Coding!


No Responses

Write a response