Today we look at some common Data Structures and ways to use them efficiently.

Array/List

/data/1/2022/2/array.png

List contains a sequence of values in an ordered fashion which is placed adjacently in memory. The list is identified by the address of its first element. Since the order of placement of the elements of a list in the memory follows the same order in which they are defined (in the example below, since 2 comes after 12, the placement of 12 in the memory is right after 2), each element in the list can be accessed by incrementing the address of the first element by the right index amount. Hence accessing any element in the list takes constant time irrespective of their position in the list.

Time complexity:

  • Access — O(1): Accessing an element in the list requires addressing it with the index.
  • Search — O(n): Searching if an element exists in a list requires (in the worst case) to traverse each index on by one
  • Insert/Delete — O(n): Inserting(Deleting) an element from a list first requires to be found which is O(n)

Strengths:

  • Fast lookups. Retrieving the element at a given index takes O(1)O(1) time, regardless of the length of the array.
  • Fast appends. Adding a new element at the end of the array takes O(1)O(1) time, if the array has space.

Weaknesses:

  • Fixed size. You need to specify how many elements you're going to store in your array ahead of time. (Unless you're using a fancy dynamic array.)
  • Costly inserts and deletes. You have to "scoot over" the other elements to fill in or close gaps, which takes worst-case O(n)O(n) time.

Linked List

/data/1/2022/2/linked_list.png

Linked List as opposed to Lists/Arrays does not have their order defined by their physical placement in the memory. Contiguous elements of the linked list are not placed adjacent to each other in the memory. Instead, each linked list element contains both the values and the address (pointer) to the next linked list element. Hence the linked list can only be traversed sequentially going through each element at a time. This also means that the length of the linked list can only be known after completely traversing it element by element. The last element of the linked list has None/Null as its pointer. The linked list is defined by a pointer pointing to the first element.

Time complexity:

  • Access — O(n): Accessing an element in the linked list requires (in the worst case) to traverse each element one by one.
  • Search — O(n): Searching if an element exists in a linked list requires (in the worst case) to traverse each element on by one
  • Insert/Delete — O(1): Inserting(Deleting) an element given the pointer where it needs to be inserted(deleted) only requires to re-arrange the pointers. Inserting(Deleting) an element at the end however would require traversing the entire linked list and hence would be O(n)

Strengths:

  • Fast operations on the ends. Adding elements at either end of a linked list is O(1)O(1). Removing the first element is also O(1)O(1).
  • Flexible size. There's no need to specify how many elements you're going to store ahead of time. You can keep adding elements as long as there's enough space on the machine.

Weaknesses:

  • Costly lookups. To access or edit an item in a linked list, you have to take O(i)O(i) time to walk from the head of the list to the iith item.

Uses:

  • Stacks and queues only need fast operations on the ends, so linked lists are ideal.

Not cache-friendly

Most computers have caching systems that make reading from sequential addresses in memory faster than reading from scattered addresses.

Array items are always located right next to each other in computer memory, but linked list nodes can be scattered all over.

So iterating through a linked list is usually quite a bit slower than iterating through the items in an array, even though they're both theoretically O(n)O(n) time.


Hash Tables

/data/1/2022/2/hash.png

Hash tables can be considered as a general form of lists. In lists, we map indices to values that can then be accessed in constant time. Hash tables try to map a data type (integer, float, string, etc.) to another data type creating paired assignments (key mapped to values) so the pairs can be accessed in constant time.

For each (key, value) pair, the key is passed through a hash function in an attempt to create a unique physical address for the value to be stored in the memory. Most of the time, the hash function can create unique physical addresses across key values. Sometimes, the hash function can end up generating the same physical address for different keys (say key_1, key_2). This is called a collision. Hash tables deal with collisions by creating a linked list strong both the keys and values. The linked list is then traversed for matching <key> returning the <value> pair. Hash tables can be useful when to have to carry out multiple search operations within your code algorithm.

Time complexity:

  • Search — O(1): Searching if a key exists in a hash table requires (on average) a constant amount of time.
  • Insert/Delete — O(1): Inserting(Deleting) a <key, value> pair from the hash table requires a constant amount of time irrespective of the size of the dictionary

Strengths:

  • Fast lookups. Lookups take O(1)O(1) time on average.
  • Flexible keys. Most data types can be used for keys, as long as they're hashable.

Weaknesses:

  • Slow worst-case lookups. Lookups take O(n)O(n) time in the worst case.
  • Unordered. Keys aren't stored in a special order. If you're looking for the smallest key, the largest key, or all the keys in a range, you'll need to look through every key to find it.
  • Single-directional lookups. While you can look up the value for a given key in O(1)O(1) time, looking up the keys for a given value requires looping through the whole dataset—O(n)O(n) time.
  • Not cache-friendly. Many hash table implementations use linked lists, which don't put data next to each other in memory.

Queue

/data/1/2022/2/queue.png

A queue is a sequential data structure that maintains the order of elements as they were inserted into the Queue. It maintains a First In First Out (FIFO) order, which means that the elements can only be accessed in the same order as they were inserted into the queue. The element to be inserted first, will the first one to get removed from the queue.

The two most common operations performed on a queue are Enqueue() and Dequeue(). Enqueue() adds an element into the queue, while Dequeue() removes an element from it. At a given stage, the element removed by the Dequeue depends on the initial state of the queue and the order of Enqueue() operations.

To understand how a queue works, let us consider two positions in a (horizontally placed, for better understanding) queue — front and end. Whenever an element is added (Enqueue()) it is added to the end of the queue. On the other hand, element removal (Dequeue()) is done from the front of the queue. A real-life example is a check-out line at a grocery store. The customer gets into the queue (Enqueue) and waits for its turn. It can only be processed (removed from the queue) when all the customers ahead of him/her have been processed (removed from the queue).

Time complexity (Average):

  • Access — O(n): Accessing an element in the queue requires (in the worst case) to Dequeue() each element one by one.
  • Search — O(n): Searching if an element exists in a queue requires (in the worst case) to Dequeue() each element and comparing it with the target.
  • Insert/Delete — O(1): Inserting (Deleting) an element from the queue, adds (removes) the element to (from) the end (front) of the queue. This can always be done in a constant amount of time.

Strengths:

  • Fast operations. All queue operations take O(1)O(1) time.

Uses

  • Breadth-first search uses a queue to keep track of the nodes to visit next.
  • Printers use queues to manage jobs—jobs get printed in the order they're submitted.
  • Web servers use queues to manage requests—page requests get fulfilled in the order they're received.
  • Processes wait in the CPU scheduler's queue for their turn to run.

Stack

/data/1/2022/2/stack.png

Stack is also a sequential data structure (like Queue) which maintains the order of elements as they were inserted in. However, unlike Queue, a Stack maintains a Last In First Out (LIFO) order, which means that the elements can only be accessed in the reverse order as they were inserted into the stack. The element to be inserted last, will the first one to get removed from the stack.

The two most common operations performed on a stack are push() and pop(). Push() (similar to Enqueue()) adds an element into the stack, while pop() (just like Dequeue()) removes an element from it. At a given stage, the element removed by pop() depends on either the initial state of the stack or the last push() operation.

To understand how a stack works, let us consider two positions in a (vertically placed for better understanding) stack — head and tail. Whenever an element is added or removed (push() or pop()), it is always done at the head position. A real-life example of a stack is a stack of kitchen plates. The plate that was the last one to be added to the stack of plates will be the first one to be used (although you can access the plates below the top one, people rarely do that).

Time complexity (Average):

  • Access — O(n): Accessing an element in the stack requires (in the worst case) to pop() each element one by one.
  • Search — O(n): Searching if an element exists in a stack requires (in the worst case) to pop() each element and comparing it with the target.
  • Insert/Delete — O(1): Inserting (Deleting) an element to the stack, adds (removes) the element at the top of the stack. This can always be done in a constant amount of time.

Strengths:

  • Fast operations. All stack operations take O(1)O(1) time.

Uses:

  • The call stack is a stack that tracks function calls in a program. When a function returns, which function do we "pop" back to? The last one that "pushed" a function call.
  • Depth - first search uses a stack (sometimes the call stack) to keep track of which nodes to visit next.
  • String parsing — stacks turn out to be useful for several types of string parsing.

Trees (Binary)

/data/1/2022/2/tree.png

A tree is a data structure that maintains a hierarchical relation between its elements. Each element has a predecessor and multiple successors, called parent and children respectively. In this section, we will consider a binary tree. In a binary tree, each node can have at most two children (left child, right child).

The Following are some basic definitions associated with a binary tree

  • Root Node − The node at the top of the tree is called root. The tree is defined by the pointer pointing to its root node.
  • Parent Node − Any node that has at least one child is known as the parent node.
  • Child Node − The successor of a parent node is known as a child node. A node can be both a parent and a child node. The root is never a child node.
  • Leaf Node− The node which does not have any child node is called the leaf node.
  • Subtree − A valid subset of the original tree.
  • Path − The sequence of nodes along the edges of the tree is known as a path.
  • Traversing − Passing through the nodes in a certain order e.g. Breadth-first traversal, depth-first traversal, etc.

Based on the conditional hierarchy, we can have multiple types of trees. One of the most commonly used is a binary search tree. A Binary Search Tree (BST) is an ordered or sorted binary tree where the left children have values less than the parent node’s value, and right children have values greater than the parent node.

Uses:

  • Filesystems—files inside folders inside folders
  • Comments—comments, replies to comments, replies to replies
  • Family trees—parents, grandparents, children, and grandchildren
  • And much, much more.

Graph (abstract data type)

/data/1/2022/2/queue.png

A graph consists of nodes or vertices and edges which connect a pair of nodes. Formally speaking, a graph G is a pair of sets (V, E), where V is the set of all the vertices and E is the set of all the edges. A neighbor of a node or vertex is set if all vertices are connected with that node through an edge. As opposed to trees, a graph can be cyclic, which means starting from a node and following the edges, you can end up on the same node.

Strengths:

  • Representing links. Graphs are ideal for cases where you're working with things that connect to other things. Nodes and edges could, for example, respectively represent cities and highways, routers and ethernet cables, or Facebook users and their friendships.

Weaknesses:

  • Scaling challenges. Most graph algorithms are O(n*lg(n))O(n∗lg(n)) or even slower. Depending on the size of your graph, running algorithms across your nodes may not be feasible.

Terminology

  • Directed or undirected - In directed graphs, edges point from the node at one end to the node at the other end. In undirected graphs, the edges simply connect the nodes at each end. In a directed graph, edges point from one node to another. In an undirected graph, the edges simply connect the nodes at each end.
  • Cyclic or acyclic - A graph is cyclic if it has a cycle; an unbroken series of nodes with no repeating nodes or edges that connects back to itself. Graphs without cycles are acyclic. A cyclic graph has at least one cycle: an unbroken series of nodes with no repeating nodes or edges that connects back to itself. Otherwise the graph is acyclic.
  • Weighted or unweighted - If a graph is weighted, each edge has a "weight." The weight could, for example, represent the distance between two locations, or the cost or time it takes to travel between the locations. Each edge in a weighted graph has a weight, which could represent counts, distance, time, or any other quantifiable measurement.
  • Legal coloring - A graph coloring is when you assign colors to each node in a graph. A legal coloring means no adjacent nodes have the same color. For a graph to be colored legally, no adjacent nodes can share the same color.

Thats all for Part 1, the next article will cover less common but equally important (Bloom Filter, LRU Cache, Priority Queue, etc.).