DSA - Chapter 3
Chapter 3. Data Structures
Data types (Encapsulation)
A programming language usually provides certain basic data types.
Some provide additional data types.
Abstract data type (ADT)
We can extend the concept of encapsulation to more complicated data types.
It specifies the information we want to maintain together with a collection of operations that manage the information.
To realize an ADT in a program:
- Build data structures to organize and to represent the information.
- Devise algorithms to implement the operations.
Aggregation
3 common aggregate mechanisms: array, record structure, and pointer
With the basic data types and the aggregate mechanisms, one can construct more complex data structures to represent more complex ADTs.
Choosing a data structure
Devise data structures to organise the data in such a way that:
- Associated operations can be supported efficiently.
- Structures are space efficient.
Space Complexity
Another aspect to consider in selecting a data structure is the storage requirement.
Sets
Collection of elements which may have a “key” identifying the element.
Operations include:
- INIT(S) - initialise a set S
- SEARCH(S, k) - retrieve the element with key = k
- INSERT(S, x) - insert element x into set S
- DELETE(S, x) - delete element x from S
- DELETE_Key(S, k) - delete element with key k from set S
The List ADT
An ordered set of elements.
Operations include:
-
INIT(L) - requires $\Theta (1)$ time complexity
L.length = 0; -
SEARCH(L, k) - requires $\Theta(n)$ time complexity
SEARCH(L, k){ i=0; while (i <= L.length-1){ if (L.list_array[i].key == k) break; else i++; } if (i > L.length-1) return ("not found"); else return (i); } - INSERT(L, I, x) - insert element x at position i
- Worst-case complexity is $\Theta(n)$ because inserting at the head of the list requires first pushing the entire array down one spot to make room.
- DELETE(L, x)
- Worst-case complexity is $\Theta(n)$ because deleting the first element requires shifting all the elemtents in the list up one.
-
FIND_i-th(L) - find the I-th element of list L, requires $\Theta(1)$ time complexity
return L.list_array[i-1];
Linked List
A linked list consists of a series of structures (called nodes).
Each node contains an element + a link to successor structure (the “next” pointer).
The last node’s next pointer equals “NULL”.
Main operations:
- SEARCH
- Requires traversing the list through the next pointers.
- Time complexity: $O(n)$
SEARCH(L, k){ node_ptr p; p = L -> next; while ((p!=NULL) && (p->key != k)) p = p -> next; return (p); } - DELETE
- Can be done by one pointer change
- Time complexity: $O(n)$
- e.g., Originally: $a_2->a_3->a_4$ After delete: $a_2 -> a_4$
//Delete successor of p DELETE_SUC(p){ node_ptr p1; if (p -> next == NULL) return; p1 = p -> next; p -> next = p1 -> next; delete p1; }//Delete an element pointed to by p DELETE(L, p){ node_ptr p1; p1 = L; while ((p1 -> next != NULL) && (p1 -> next != p)) p1 = p1 -> next; DELETE_SUC(p1); }//Delete by adding "prev" pointer DELETE(p){ p -> prev -> next = p -> next; p -> next -> prev = p -> prev; delete p; } - INSERT
- Requires obtaining a new node + 2 pointer changes
- Time complexity:
- If we know where to insert or don’t care where the new element is inserted: $O(1)$
- If we want to insert the element at the $i$-th position: $O(i)$
- e.g., Originally: $a_2->a_3->a_4$ After insertion: $a_2->x->a_3$
INSERT(p, k){ node_ptr p1; p1 = new node; p1 -> key = k; p1 -> next = p -> next; p -> next = p1; } -
INIT
INIT(L){ L = new node; L -> next = NULL; } -
Check if list L is empty
IS_EMPTY(L){ if (L -> next == NULL) return true; else return false; }
Comparing array and linked list
- Arrays are more suited for the “find the n-th element” operation that characterises efficient access through indexes.
- Linked lists are more appropriate if we need to rearrange the items very often.
- Insert and delete with linked list requires dynamic allocation and de-allocation of memory.
Sorted List
Although insert takes constant time, searching for a key may require traversing the whole list.
Searching a sorted list:
SEARCH_SL(L, k){
node_ptr p;
p = L -> next;
while ((p != NULL) && (p -> key < k))
p = p -> next;
if ((p != NULL) && (p -> key == k))
return p;
else
return NULL;
}
Stack ADT
Stack is a list with the restriction that inserts and deletes are performed at one end of the list only.
Last In First Out
Main operations:
-
push(x): insert at top of stack
push(S, x){ if (S.tos == max_cap - 1) return "Stack overflow!" S.tos++ S.stack_array[S.tos] = x; } -
pop(): return and delete the top element
pop(S){ if (S.tos == -1) return "Stack overflow!" x = S.stack_array[S.tos]; S.tos-- return x } -
top(): return the top element
Stack can be implemented using a singly linked list, but since modification to the stack are done at only one end of the list, to avoid pointers and dynamic memory allocation, array can be an alternative implementation.
//Array implementation
struct Stack {
int tos; //top of stack
element stack_array[max_cap];
}
Queue ADT
Queue is another type of list.
Different from stack, insert is done at one end (tail) and delete is performed at the other end (head).
First In First Out
Main operations:
-
INIT
struct Queue { int head; int length; element queue_array[max_cap]; } INIT(q){ q.head = 1; q.tail = 0; q.length = 0; } -
enqueue(Q, x): add element x to the tail of the queue
enqueue(q, x){ if (q.length == max_cap) return ("Queue overflow"); tail = (tail + 1) mod max_cap; q.queue_array[tail] = x; q.length ++; } -
dequeue(Q): remove and return the element from the head of the queue
dequeue(q){ if (q.length == 0) return ("Queue overflow"); x = q.queue_array[head]; head = ( head + 1 ) mod maxcap; q.length --; return x; }
Queue can be implemented as a doubly linked list with a pointer to the head and one to tail.
Queue can also be implemented by an array.
Graph ADT
Graph is a set of vertices and a set of connections (edges) among the vertices.
If a vertex represents certain entity or object, then a graph can be used to represent binary relation between objects.
Adjacency matrix
Undirected graph using a 2-dimensional array.
Entry[I, j] == 1 if vertices I and j are connected; otherwise the entry is 0.
Adjacency list
Array of linked list.
For each vertex v, we keep a linked list, which contains one node for each vertex that is connected to v.
Space efficiency
Let V = number of vertices, E = number of edges.
Space requirements:
- Adjacency matrix: $O(V^2)$
- Adjacency lists: $O(V+E)$
Comparison
Matrix representation is more efficient for operations that require determining whether two vertices are connected.
Adjacency list representation is more efficient for operations that require processing all the edges in the graph or finding all the neighbours of a vertex, especially if the graph is sparse.
Breadth-First Search
BFS(k, n){ //k = starting vertex, n = number of vertices
Queue Q;
bit visited[n];
INIT(Q);
visited[0.. n-1] = all 0's;
enqueue(Q, k);
while (!empty(Q)){
i = dequeue(Q)
if (!visited[i]){
visited[i] = 1;
output(i);
for each neighbor j of i:
if (!visited[j]) enqueue(Q, j);
}
}
}
Other basic algorithms on graphs
- Dearth-First Search
- Random walk
- Shortest Path (Dijkstra’s algorithm)
- Connected components
- Network flow