A regular expression is a sequence of characters that define a search pattern.
The regular expressions supported by this tool may contain:
An example regular expression using these symbols is (ab|cd)+(e|0). The strings that match this regular expression are exactly the strings that start with 1 or more instances of the strings ab or cd, and after that either the character e, or nothing. Some examples of strings that match this regular expression are ab, cde, abcde, cdcdabcd.
For more information about regular expressions, click here (some different notation and way more possibilities, but the same concept)
A finite state machine (also called a finite automaton) is a finite set of states N, together with a transition function. This transition function takes a state and a symbol as input, and returns a set of states, a subset of N. These returned states are the states that can be reached by starting from the given input state and reading the given input symbol. The set of symbols an automaton contains is called the alphabet.
A finite state machine also has one starting state, the state we are in after reading no input symbols. Finally, any of the states in N can be accepting/final. If we end up in an accepting state after reading a string, we say that string is accepted by the machine. This tool will create finite state machines, such that the accepted strings are those that exactly match the given regular expression.
Here states will be represented as circles, accepting states as double circles. The transition function will be represented as directed edges between these states, meaning you can go from one state to another using the symbol in the edge's label. The starting state will have an incoming edge from nowhere.We will define 4 different types of finite state machines, each with a more strict definition than the previous.
To convert a regular expression to a DFAm, we use 4 different algorithms, each resulting in the next type from the previous section. Here follows a brief overview of all the algorithms, but for more in depth explanations, I used the book "Introduction to Languages and the Theory of Computation" by John C. Martin.
The first algorithm converts a regular expression to a NFA-λ. This is done using a bottom-up construction. We start with finite state machines for a single character, and 3 template state machines for the 3 main operations (alternation, concatenation and the quantifiers). We then substitute the simple state machines into the template machine we need, until we have a finite state machine for the entire regular expression. Finally, some basic simplification is done to make the state machine more readable.
The second algorithm converts a NFA-λ to a NFA by replacing the λ-transitions with normal transitions. We start off by determining the λ-closure of each state: The set of states that can be reached from that state using only λ-transitions. Next, we make any state whose λ-closure contains an accepting state also accepting. Then, for every state p and for every state r in the λ-closure of p, we duplicate all the normal outgoing edges from r, and make their starting point p. Finally, we remove all the λ-transitions, and if during this process any of the states have become unreachable from the starting state, we remove them.
The third algorithm converts a NFA to a DFA using an algorithm called "subset construction". We start with a new starting state that corresponds to the old starting state. We then determine for each of the symbols in the alphabet, the set of states can be reached from the starting state using that symbol. If there is not yet a new state corresponding to this old set of states, we create this new state and add an edge from the starting state to this new state with the given symbol. If there is already a new state corresponding to this old set of states, we simply add an edge from the starting state to this state with the given symbol. We continue this process for each of the newly created states, until all of the new states have an outgoing edge for every symbol in the alphabet. A new state in this DFA is accepting, if any of the corresponding old states were accepting in the NFA.
The last algorithm converts a DFA to a DFAm by minimizing it. This is done by determining which of the states of the DFA are equivalent, and merging them. Two states are called equivalent, if the strings that cause you to end up in an accepting state starting from that state, are exactly the same for both states. To determine which states are equivalent, we start off by marking each pair of states as equivalent, and step by step mark pairs as not equivalent if we find proof they are not. In the first step, we mark every pair of states, where one of the states is accepting and the other one is non-accepting, as non-equivalent, since for example the empty string causes one of them to end up in an accepting state, but not the other one. In each of the next steps, we take a pair of states that is marked equivalent, and for each symbol in the alphabet, we check if the new pair of states that we get from following the edge with that symbol from the starting pair, is marked as equivalent. If this new pair is marked as not-equivalent, we have found a symbol that leads the original pair of states to two non-equivalent states. This implies that the original pair of states is also not equivalent, and so we mark it as such. We do this until we cannot mark any more pairs as non-equivalent. Finally, we merge the states that have been found to be equivalent, and we are done.
Welcome!
This tool is used to visualize which strings exactly match
a regular expression and which don't, using finite state machines
You can start playing around now by pressing "Example expression" or by typing in your own regular expression
Press the "help" button at any time for more information about the application