6 minute read

Chapter 3. Data Structures

Data types (Encapsulation)

A programming language usually provides certain basic data types.

Some provide additional data types.

Abstract data type (ADT)

We can extend the concept of encapsulation to more complicated data types.

It specifies the information we want to maintain together with a collection of operations that manage the information.

To realize an ADT in a program:

  • Build data structures to organize and to represent the information.
  • Devise algorithms to implement the operations.

Aggregation

3 common aggregate mechanisms: array, record structure, and pointer

With the basic data types and the aggregate mechanisms, one can construct more complex data structures to represent more complex ADTs.

Choosing a data structure

Devise data structures to organise the data in such a way that:

  • Associated operations can be supported efficiently.
  • Structures are space efficient.

Space Complexity

Another aspect to consider in selecting a data structure is the storage requirement.

Sets

Collection of elements which may have a “key” identifying the element.

Operations include:

  • INIT(S) - initialise a set S
  • SEARCH(S, k) - retrieve the element with key = k
  • INSERT(S, x) - insert element x into set S
  • DELETE(S, x) - delete element x from S
  • DELETE_Key(S, k) - delete element with key k from set S

The List ADT

An ordered set of elements.

Operations include:

  • INIT(L) - requires $\Theta (1)$ time complexity

      L.length = 0;
    
  • SEARCH(L, k) - requires $\Theta(n)$ time complexity

      SEARCH(L, k){
      	i=0;
      	while (i <= L.length-1){
      		if (L.list_array[i].key == k)
      			break;
      		else
      			i++;
      	}
      	if (i > L.length-1) return ("not found");
      	else return (i);
      }
    
  • INSERT(L, I, x) - insert element x at position i
    • Worst-case complexity is $\Theta(n)$ because inserting at the head of the list requires first pushing the entire array down one spot to make room.
  • DELETE(L, x)
    • Worst-case complexity is $\Theta(n)$ because deleting the first element requires shifting all the elemtents in the list up one.
  • FIND_i-th(L) - find the I-th element of list L, requires $\Theta(1)$ time complexity

      return L.list_array[i-1];
    

Linked List

A linked list consists of a series of structures (called nodes).

Each node contains an element + a link to successor structure (the “next” pointer).

The last node’s next pointer equals “NULL”.

Main operations:

  • SEARCH
    • Requires traversing the list through the next pointers.
    • Time complexity: $O(n)$
      SEARCH(L, k){
      	node_ptr p;
      	p = L -> next;
      	while ((p!=NULL) && (p->key != k))
      		p = p -> next;
      	return (p);
      }
    
  • DELETE
    • Can be done by one pointer change
    • Time complexity: $O(n)$
    • e.g., Originally: $a_2->a_3->a_4$ After delete: $a_2 -> a_4$
      //Delete successor of p
      DELETE_SUC(p){
      	node_ptr p1;
      	if (p -> next == NULL) return;
      	p1 = p -> next;
      	p -> next = p1 -> next;
      	delete p1;
      }
    
      //Delete an element pointed to by p
      DELETE(L, p){
      	node_ptr p1;
      	p1 = L;
      	while ((p1 -> next != NULL) && (p1 -> next != p))
      		p1 = p1 -> next;
      	DELETE_SUC(p1);
      }
    
      //Delete by adding "prev" pointer
      DELETE(p){
      	p -> prev -> next = p -> next;
      	p -> next -> prev = p -> prev;
      	delete p;
      }
    
  • INSERT
    • Requires obtaining a new node + 2 pointer changes
    • Time complexity:
      • If we know where to insert or don’t care where the new element is inserted: $O(1)$
      • If we want to insert the element at the $i$-th position: $O(i)$
    • e.g., Originally: $a_2->a_3->a_4$ After insertion: $a_2->x->a_3$
      INSERT(p, k){
      	node_ptr p1;
      	p1 = new node;
      	p1 -> key = k;
      	p1 -> next = p -> next;
      	p -> next = p1;
      }
    
  • INIT

      INIT(L){
      	L = new node;
      	L -> next = NULL;
      }
    
  • Check if list L is empty

      IS_EMPTY(L){
      	if (L -> next == NULL)
      		return true;
      	else
      		return false;
      }
    

Comparing array and linked list

  • Arrays are more suited for the “find the n-th element” operation that characterises efficient access through indexes.
  • Linked lists are more appropriate if we need to rearrange the items very often.
  • Insert and delete with linked list requires dynamic allocation and de-allocation of memory.

Sorted List

Although insert takes constant time, searching for a key may require traversing the whole list.

Searching a sorted list:

SEARCH_SL(L, k){
	node_ptr p;
	p = L -> next;
	while ((p != NULL) && (p -> key < k))
		p = p -> next;
	if ((p != NULL) && (p -> key == k))
		return p;
	else
		return NULL;
}

Stack ADT

Stack is a list with the restriction that inserts and deletes are performed at one end of the list only.

Last In First Out

Main operations:

  • push(x): insert at top of stack

      push(S, x){
      	if (S.tos == max_cap - 1)
      		return "Stack overflow!"
      	S.tos++
      	S.stack_array[S.tos] = x;
      }
    
  • pop(): return and delete the top element

      pop(S){
      	if (S.tos == -1)
      		return "Stack overflow!"
      	x = S.stack_array[S.tos];
      	S.tos--
      	return x
      }
    
  • top(): return the top element

Stack can be implemented using a singly linked list, but since modification to the stack are done at only one end of the list, to avoid pointers and dynamic memory allocation, array can be an alternative implementation.

//Array implementation
struct Stack {
	int tos; //top of stack
	element stack_array[max_cap];
}

Queue ADT

Queue is another type of list.

Different from stack, insert is done at one end (tail) and delete is performed at the other end (head).

First In First Out

Main operations:

  • INIT

      struct Queue {
      	int head;
      	int length;
      	element queue_array[max_cap];
      }
        
      INIT(q){
      	q.head = 1;
      	q.tail = 0;
      	q.length = 0;
      }
    
  • enqueue(Q, x): add element x to the tail of the queue

      enqueue(q, x){
      	if (q.length == max_cap)
      		return ("Queue overflow");
      	tail = (tail + 1) mod max_cap;
      	q.queue_array[tail] = x;
      	q.length ++;
      }
    
  • dequeue(Q): remove and return the element from the head of the queue

      dequeue(q){
      	if (q.length == 0)
      		return ("Queue overflow");
      	x = q.queue_array[head];
      	head = ( head + 1 ) mod maxcap;
      	q.length --;
      	return x;
      }
    

Queue can be implemented as a doubly linked list with a pointer to the head and one to tail.

Queue can also be implemented by an array.

Graph ADT

Graph is a set of vertices and a set of connections (edges) among the vertices.

If a vertex represents certain entity or object, then a graph can be used to represent binary relation between objects.

Adjacency matrix

Undirected graph using a 2-dimensional array.

Entry[I, j] == 1 if vertices I and j are connected; otherwise the entry is 0.

Adjacency list

Array of linked list.

For each vertex v, we keep a linked list, which contains one node for each vertex that is connected to v.

Space efficiency

Let V = number of vertices, E = number of edges.

Space requirements:

  • Adjacency matrix: $O(V^2)$
  • Adjacency lists: $O(V+E)$

Comparison

Matrix representation is more efficient for operations that require determining whether two vertices are connected.

Adjacency list representation is more efficient for operations that require processing all the edges in the graph or finding all the neighbours of a vertex, especially if the graph is sparse.

BFS(k, n){    //k = starting vertex, n = number of vertices
	Queue Q;
	bit visited[n];
	INIT(Q);
	visited[0.. n-1] = all 0's;
	enqueue(Q, k);
	while (!empty(Q)){
		i = dequeue(Q)
		if (!visited[i]){
			visited[i] = 1;
			output(i);
			for each neighbor j of i:
				if (!visited[j]) enqueue(Q, j);
		}
	}
}

Other basic algorithms on graphs

  • Dearth-First Search
  • Random walk
  • Shortest Path (Dijkstra’s algorithm)
  • Connected components
  • Network flow

Categories:

Updated: