A Deterministic Finite-State Automaton is a 5-tuple $(Q,\Sigma, \delta, q_0,F)$ where

Q is a finite set of states

$\Sigma$ is a finite alphabet

$\delta: Q\times \Sigma \rightarrow Q$ is the transition function

$q_0\in Q$ is the start state, and

$F\subseteq Q$ is the set of accept states

Let $M=(Q,\Sigma, \delta, q_0,F)$ to be a DFA and let $w=w_1w_2...w_n$ be a word over $\Sigma$. M accepts w if there is a sequence of states $r_0,r_1,r_2,...,r_n$ satisfying the following conditions

$r_0=q_0$

$\delta(r_i,w_{i+1})=r_{i+1}$ for every i, $0\leqslant i\leqslant n-1$

$r_n\in F$

The DFA M recognises the language L if L = $\{w| M \text{ accepts } w\}$

Regular Language

A language is called a regular language if some DFA recognises it

Boolean (set-theoretic):

Operation | Formula |
---|---|

Union | $A\cup B=\{x\rvert x\in A \text{ or } x\in B\}$ |

Intersection | $A\cap B=\{x\rvert x\in A \text{ and } x\in B\}$ |

Difference | $A\backslash B=\{x\rvert x\in A \text{ or } x\notin B\}$ |

Complement | $\overline{A}=\Sigma^* \backslash A$ |

Language theory specific:

Operation | Formula |
---|---|

Concatenation | $A\circ B = \{xy \rvert x\in A \text{ and } y\in B\}$ |

Star | $A^{*}=\left\{x_{1} x_{2} \ldots x_{k} \rvert k \geq 0 \text { and } x_{i} \in A \text { for every } i, 1 \leq i \leq k\right\}$ |

A Regular Expression (RE) R defines a regular language L(R). We shall eventually prove that $RE\equiv DFA$ (i.e. REs define exactly class of the regular languages)

The definition is inductive (recursive), i.e. there are initial RE, and new REs can be obtained from old ones by means of Regular Operations

**Definition**: R is a Regular expression over the alphabet $\Sigma$ if
R is

a for some $a\in \Sigma$

$\epsilon$

$\varnothing$

$(R_1\cup R_2)$, where $R_1$ and $R_2$ are REs

$(R_1\circ R_2)$, where $R_1$ and $R_2$ are REs, or

$(R_1^*)$, where $R_1$ is an RE

Note that by convention the concatenation symbol may be omitted, i.e. $R_1R_2$ means $R_1\circ R_2$. Parentheses may also be omitted, bearing in mind the precedence order

Given $L_1$ recognised by $M_1=(Q_1,\Sigma,\delta_1,q_1,F_1)$ and $L_2$ recognised by $M_1=(Q_2,\Sigma,\delta_2,q_2,F_2)$; want to combine $M_1$ and $M_2$ into a new automaton M that would recognise $L_1\cup L_2$

**Naive idea**: Simulate first $M_1$ on the input and then simulate
$M_2$ on the same input; accept if either $M_1$ or $M_2$ or both accept.
This does not work as after $M_1$ has run on the input, the input is
exhausted and there is no way to “rewind” it in order to run $M_2$

The solution is to run $M_1$ and $M_2$ on the input in parallel

$w\in L_1 \circ L_2$ only if there are words $x$ and $y$ such that $w=xy, x\in L_1$ and $y\in L_2$

We need to run $M_1$ on a prefix of $w$ and then to run $M_2$ on the rest. To find the break point we guess it (non deterministically)

The result is not a DFA because of the red transitions

A NFA is a 5-tuple $(Q,\Sigma, \delta, q_0, F)$, where

Q is a finite set of states

$\Sigma$ is a finite alphabet

$\delta: Q\times (\Sigma\cup \{\epsilon\})\rightarrow \mathscr{P}(Q)$ is the transition function

$q_0\in Q$ is the start state, and

$F\subseteq Q$ is the set of accept states

Let $M=(Q,\Sigma, \delta, q_0, F)$ be a NFA and $w=w_1w_2...w_n$ be a word over $\Sigma\cup \{\epsilon\}$

Important

$\epsilon$’s can be freely added inside the actual word $\in\Sigma^*$

M accepts w if there exists a sequence of states $r_0,r_1,r_2,...,r_n$ satisfying the following conditions:

$r_0=q_0$

$r_{i+1}\in \delta(r_i,w_{i+1})$ for every $0\leqslant i\leqslant n-1$ and

$r_n\in F$

**DFA** - There is a unique computation (path). It either accepts or
rejects

**NFA** - There are, in general, many computational paths. The NFA
accepts if at least one computation accepts. The NFA rejects only if all
the computations reject.

At any time there are a number of possible states the NFA computation can be into. As the DFA needs to keep track of all of these, the DFA’s states will correspond to subsets of the NFA’s-state set. The transition function of the DFA can then be defined as shown below:

Let $N=(Q,\Sigma, \delta, q_0, F)$ be a NFA recognising some language L (assume that L has no $\epsilon$ transitions). We shall construct a DFA $M=(Q',\Sigma, \delta', q_0', F')$ that recognises the same language L as follows

$Q'=\mathscr{P}(Q)$

For $R\in Q'$ and $a\in \Sigma$ let

$$\delta^{\prime}(R, a)=\{q \in Q | q \in \delta(r, a) \text { for some } r \in R\}$$Equivalently

$$\delta^{\prime}(R, a)=\bigcup_{r \in R} \delta(r, a)$$$q_0'=\{q_0\}$

$F^{\prime}=\left\{R \in Q^{\prime} | \exists r \in R r \in F\right\}$

Let us take care of $\epsilon$-transitions. For $R\subseteq Q$ (same as $R\in Q'$) let

$$E(R)=\{q | q \text { is reachable from } R \text { through a number of } \varepsilon \text { -transitions. }\}$$

Modify the construction as follows

$Q'=\mathscr{P}(Q)$

For $R\in Q'$ and $a\in \Sigma$ let

$$\delta^{\prime}(R, a)=\{q \in Q | q \in E( \delta(r, a)) \text { for some } r \in R\}$$Equivalently

$$\delta^{\prime}(R, a)=\bigcup_{r \in R} E(\delta(r, a))$$$q_0'=E(\{q_0\})$

$F^{\prime}=\left\{R \in Q^{\prime} | \exists r \in R r \in F\right\}$

Union

$$A \cup B=\{x | x \in A \text { or } x \in B\}$$

Concatenation

$$A \circ B=\{x y | x \in A \text { and } y \in B\}$$

Star

$$A^{*}=\left\{x_{1} x_{2} \ldots x_{k} | k \geq 0 \text { and } x_{i} \in A \text { for every } i, 1 \leq i \leq k\right\}$$

Theorem

A language is regular iff some regular expression describes it

**Proof**

$\Leftarrow$ Given a RE R, construct a NFA that recognises L(R) - this can be done by structural induction

*Base case* - All initial REs can be implemented by NFAs

*Inductive step* - We have already proven that the class of regular
languages is closed under regular operations

$\Rightarrow$ Given a DFA M, construct a RE that recognises L(M)

A single start state with no incoming edges

A single accept state with no outgoing edges

Every transition is labelled with a regular expression

Convert the given n-state DFA(NFA) into an equivalent n+2 state GNFA

Eliminate all internal states of the GNFA one by one, while preserving the language recognised by the automaton

We are done, as $L(R)=L(DFA)$

Pick an internal state r for elimination

For every two states u and v $u\neq r\neq v$, do the following

How do we prove that a given language N is not regular?

- Can’t be recognised by DFA: assume that there is a DFA that recognises N and then use an adversary argument against that DFA in order to get a contradiction

For every regular language L, there is a number p (called “pumping length” of L) such that every word $w\in L$ of length at least p can be divided into three parts $w=xyz$, satisfying the following conditions:

$xy_iz\in L$ for every $i\in \mathbb{N}$ (in particular $xz\in L$ as one can take $i=0$)

$y$ is a non empty string

The length of the string $xy$ is not greater than p

**Proof**
Let $M=(Q,\Sigma, \delta, q_0, F)$ be a DFA that recognises L,
and let us set the pumping length p to be the number of states in M:
$p=|Q|$

Consider the computation of M on the word $w=w_1w_2...2_n$ (where $n\leqslant p$)

$$q_{0}=r_{0} \rightarrow^{w_{1}} \rightarrow r_{1} \rightarrow r_{2} \rightarrow r_{2} \rightarrow \cdots \cdots \cdots r_{n-1} \rightarrow r_{n} \in F$$

Let i be the first moment in time where we see a repetition in the states $r_0,r_1,r_2,...,r_n$ (such an i exists by the pigeon hole principle as $n+1>p$) Then we can set $x=w_1w_2...w_j$, $y=w_{j+1}..w_i$ and $z=w_{i+1}...w_n$

It is now straightforward to check that all three conditions are met

An algorithm that “minimises” any given DFA, i.e. finds an equivalent DFA with the minimal number of states. It answers at once the following important questions

What is the smallest (in terms of number of states) DFA that recognises the same language?

Given two DFAs, do they recognise the same language

Remove all the states that are not reachable from the start state

Contract a set of states that are equivalent into a single state

Two states, s and t, are equivalent if any word w, which is accepted when starting from s, is accepted when starting from t and vice versa.

**Equivalent states**: Two states s and t are equivalent if

$$\left\{w \in \Sigma^{*} | \hat{\boldsymbol{\delta}}(s, w) \in F\right\}=\left\{w \in \Sigma^{*} | \hat{\boldsymbol{\delta}}(t, w) \in F\right\}$$

Otherwise, s and t are distinguishable, i.e. there is a witness word u such that either $\hat{\delta}(s, u) \in F, \hat{\delta}(t, u) \notin F$ or $\hat{\delta}(s, u) \notin F, \hat{\delta}(t, u) \in F$

Best explained in terms of teams of players (sets of states) that pass balls of different colours (input symbols via transmissions)

**Start**: Disqualify useless players (states that are unreachable from
the start state)

**Round 0**: Start with two teams, one consisting of all the accept
states and the other consisting of all the non-accept states. Set
$i:=1$

**Round i**: For every team $S\in \mathscr{P}(Q)$ and every colour
$a\in \Sigma$ check if all players from S agree on passing an a-ball. If
not, split the team S into the maximal sub-teams that agreed, set
$i:=i+1$ and go to round i

**End**: Return the teams as sets of equivalent states

**Proposition**: The Team-Splitting algorithm terminates

**Proof**: There cannot be more than $n-1$ splits (where n is the number
of players)

**Proposition**: Every two players that are in the same team in the end
are equivalent

**Proof**: Pick any sequences of passes (a word) and prove, by induction
of its length, that it can’t tell the two players apart

**Proposition**: Every two players that have been split at some round
can’t be equivalent

**Proof**: Induction on the round number. The inductive step - a proof
by contradiction.

Two see if two DFAs recognise the same language put them together, apply the minimisation algorithm as if they were as single automaton, and see whether the two start states are equivalent

Corollary

If $M_1$ and $M_2$ are two minimal DFAs that recognise the same language, the minimisation algorithm always produces a bijection between the (equivalent) states of $M_1$ and $M_2$. We say that $M_1$ and $M_2$ are isomorphic

**Proof**: Build the bijection starting from the two start states

Theorem

The team splitting algorithm finds the minimal DFA equivalent ot the original one

**Proof**: By first arguing that the DFA produced by the algorithm is
equivalent to the minimal one, and then observing that the former can’t
have unreachable or equivalent states, so it must be isomorphic to the
latter