Regex Visualizer

Help menu

Regular expressions

A regular expression is a sequence of characters that define a search pattern.
The regular expressions supported by this tool may contain:

Letters (a-z, A-Z)
The quantifiers *, meaning 0 or more of the previous token, and +, meaning 1 or more of the previous token
Alternation |, meaning either the expression to the left or to the right of the | has to match.
Parenthesis (), to group things together
The number 0, meaning the empty string
Concatenation

An example regular expression using these symbols is (ab|cd)+(e|0). The strings that match this regular expression are exactly the strings that start with 1 or more instances of the strings ab or cd, and after that either the character e, or nothing. Some examples of strings that match this regular expression are ab, cde, abcde, cdcdabcd.

For more information about regular expressions, click here (some different notation and way more possibilities, but the same concept)

Finite state machines

A finite state machine (also called a finite automaton) is a finite set of states N, together with a transition function. This transition function takes a state and a symbol as input, and returns a set of states, a subset of N. These returned states are the states that can be reached by starting from the given input state and reading the given input symbol. The set of symbols an automaton contains is called the alphabet.

A finite state machine also has one starting state, the state we are in after reading no input symbols. Finally, any of the states in N can be accepting/final. If we end up in an accepting state after reading a string, we say that string is accepted by the machine. This tool will create finite state machines, such that the accepted strings are those that exactly match the given regular expression.

Here states will be represented as circles, accepting states as double circles. The transition function will be represented as directed edges between these states, meaning you can go from one state to another using the symbol in the edge's label. The starting state will have an incoming edge from nowhere.

Types of finite state machines

We will define 4 different types of finite state machines, each with a more strict definition than the previous.

Nondeterministic finite automaton with λ-transitions (NFA-λ):
This is the most general, least strict type of state machine. Here, the transition function may return multiple states for any input state and symbol. This means that in some cases we have a choice what state to go to next, given an input state and symbol (nondeterminism). In the visual representation used here, this means there are multiple outgoing edges from the input state with the same symbol. The transition function is also allowed to return no states, in which case reading the input symbol from the input state is not possible. In the visual representation used here, this means there are no outgoing edges from the input state with the input symbol. The state machine may also contain λ-transitions. These transitions (edges) are labeled λ and allow you to freely move from one state to another without reading any input symbols.
Nondeterministic finite automaton (NFA):
This type is the same as the NFA-λ, except that λ-transitions are not allowed.
Deterministic finite automaton (DFA):
In deterministic finite automata, the transition function has to return exactly one state for each input state and symbol in the alphabet. This means that for every string of characters in the alphabet, there is one and at most one path you can follow. In the visual representation used here, this means that every state has exactly one outgoing edge for each symbol in the alphabet.
Minimal deterministic finite automaton (DFAm):
We call a DFA minimal if there is no other DFA with less states that accepts the exact same strings as the original DFA. It can be proven that for each DFA, there exists a unique DFAm, up to the naming of the states.

Algorithms

To convert a regular expression to a DFAm, we use 4 different algorithms, each resulting in the next type from the previous section. Here follows a brief overview of all the algorithms, but for more in depth explanations, I used the book "Introduction to Languages and the Theory of Computation" by John C. Martin.

The first algorithm converts a regular expression to a NFA-λ. This is done using a bottom-up construction. We start with finite state machines for a single character, and 3 template state machines for the 3 main operations (alternation, concatenation and the quantifiers). We then substitute the simple state machines into the template machine we need, until we have a finite state machine for the entire regular expression. Finally, some basic simplification is done to make the state machine more readable.

The second algorithm converts a NFA-λ to a NFA by replacing the λ-transitions with normal transitions. We start off by determining the λ-closure of each state: The set of states that can be reached from that state using only λ-transitions. Next, we make any state whose λ-closure contains an accepting state also accepting. Then, for every state p and for every state r in the λ-closure of p, we duplicate all the normal outgoing edges from r, and make their starting point p. Finally, we remove all the λ-transitions, and if during this process any of the states have become unreachable from the starting state, we remove them.

The third algorithm converts a NFA to a DFA using an algorithm called "subset construction". We start with a new starting state that corresponds to the old starting state. We then determine for each of the symbols in the alphabet, the set of states can be reached from the starting state using that symbol. If there is not yet a new state corresponding to this old set of states, we create this new state and add an edge from the starting state to this new state with the given symbol. If there is already a new state corresponding to this old set of states, we simply add an edge from the starting state to this state with the given symbol. We continue this process for each of the newly created states, until all of the new states have an outgoing edge for every symbol in the alphabet. A new state in this DFA is accepting, if any of the corresponding old states were accepting in the NFA.

The last algorithm converts a DFA to a DFAm by minimizing it. This is done by determining which of the states of the DFA are equivalent, and merging them. Two states are called equivalent, if the strings that cause you to end up in an accepting state starting from that state, are exactly the same for both states. To determine which states are equivalent, we start off by marking each pair of states as equivalent, and step by step mark pairs as not equivalent if we find proof they are not. In the first step, we mark every pair of states, where one of the states is accepting and the other one is non-accepting, as non-equivalent, since for example the empty string causes one of them to end up in an accepting state, but not the other one. In each of the next steps, we take a pair of states that is marked equivalent, and for each symbol in the alphabet, we check if the new pair of states that we get from following the edge with that symbol from the starting pair, is marked as equivalent. If this new pair is marked as not-equivalent, we have found a symbol that leads the original pair of states to two non-equivalent states. This implies that the original pair of states is also not equivalent, and so we mark it as such. We do this until we cannot mark any more pairs as non-equivalent. Finally, we merge the states that have been found to be equivalent, and we are done.

Features

Convert a regular expression into a DFAm, using custom animations
This gives a very visual way to see what strings match the regular expression and what strings don't
Smooth automatic animations, or animations in steps, with information about the current step
Hover over any of the states, to see an infobox with some example strings that can end up in that state
Pan and zoom the state machine around by dragging the screen and using your mousewheel, to get a better look at large and complicated state machines
A semi-in-depth help menu you are looking at right now!