Skip to main content

Regular Expressions

Learning Objectives

Students Will Be Able To:
Not be mystified by Regular Expressions
Write basic Regular Expression patterns
Use RegEx patterns to validate HTML <input> tags

Road Map

  1. Setup
  2. What Are Regular Expressions?
  3. Validating Text in an <input>
  4. Defining Regular Expression Patterns
  5. Regular Expressions in JavaScript
  6. Regular Expression Playgrounds
  7. Going Forward
  8. Further Study
  9. Additional Practice

Videos

Video 📹 Link

1. Setup

Today we're going to use the code playground, codepen.io, to experiment with regular expressions in an HTML <input> element.

  1. Create a new pen, and hide the JS pane.

  2. In the HTML pane, let's add a simple form:

    <form>
    <input type="text" required pattern="" />
    </form>
  3. Put the following in the pen's CSS pane:

    body {
    font: 6vmin Helvetica;
    height: 100vh;
    display: flex;
    justify-content: center;
    align-items: center;
    background-color: oldlace;
    margin: 0;
    }

    form {
    padding: 10vmin;
    border: 2vmin solid green;
    border-radius: 4vmin;
    background-color: lightgreen;
    }

    form:invalid {
    border-color: red;
    background-color: tomato;
    }

    input {
    font: bold 10vmin Helvetica;
    margin: 10vmin;
    padding: 4vmin;
    border: 1.8vmin solid green;
    border-radius: 3vmin;
    background-color: oldlace;
    outline: none;
    }

    input:invalid {
    border-color: red;
    }

    The CSS will change the <form> and <input> elements' border color to red whenever the text in the <input> does not match the regular expression pattern assigned to its pattern attribute.

  4. We won't need to change our CSS going forward, so go ahead and hide the CSS pane.

2. What are Regular Expressions?

A Regular Expression (AKA regex/regexp) is a sequence of characters that define a pattern used to match text in strings and files.

They are grounded in Computer Science and used throughout computing. For example, regular expressions are used by:

  • Software applications to find and manipulate text.
  • Word processors & text editors to perform search functionality.
  • System utilities like Unix/Linux/macOS's grep.

Most programming languages include an implementation of regular expressions.

They are very useful for things like web scraping and validation!

3. Validating Text in an <input>

To learn about regular expression patterns we're going to use them in a very practical way - to validate the text entered into an <input>.

The <input> element's pattern attribute is an under utilized feature and one that you can start using right away to validate your users' input.

If the text in the <input> in a <form> does not match the regular expression exactly:

  • The submit button will be disabled an will not submit the <form> to the server.
  • Both the <input> and the <form> will have the :invalid CSS pseudo-class applied.

Let's assign Mary to the pattern attribute.

Now, type "Mary" in the <input> and you will see the form's border turn green indicating a match!

tip

👀 In the context of an HTML <input> element, the regex pattern must match the text exactly. Ordinarily, regular expressions perform partial matches anywhere within text. For example, our pattern, Mary, would match the occurrence of the letters Mary anywhere within a string/file, for example
I live in Maryland.

4. Defining Regular Expression Patterns

Literal Characters

The most fundamental type of characters within a regex pattern are literal characters.

The Mary pattern we just defined matches the literal characters of M, a, r and y.

Character Class

Next up is a character class which will match a single character from characters within square brackets.

Let's change our pattern to gr[ae]y.

Check it out - both gray & grey will match!

We can use a hyphen inside of a character class to specify a range of characters. For example, [5-9] will match a single digit of 5 to 9.

More than one range can be specified. For example, [0-9a-fA-F] will match a single hexadecimal digit regardless of casing.

Character classes are great for matching frequently misspelled words like li[cs]en[cs]e.

❓ What regular expression could be used to match your name whether it is capitalized or not?


If your name is "Sylvester", the pattern of [Ss]ylvester would do the trick


Negated Character Class

Adding a ^ (caret) symbol after the opening [, changes the character class to match any character except the character(s) in the brackets.

For example, p[^ua]t matches the letter p followed by any single character except a u or a. So pit will match, but not put or pat.

Shorthand Character Classes

Because character classes are used so often, there are several shorthand character classes available.

For example, instead of using [0-9] to match a digit, you can use \d instead.

Here are some other common shorthand character classes:

  • \w will match any alphanumeric character, including digits and the underscore character.
  • \s will match any "whitespace" character, including a space, tab, newline and carriage return.
  • . (period) will match any character except line breaks.

Google will be your friend when working with regular expressions, unless you work with them frequently, there's no way to remember all this stuff!

More Negativity

Interestingly, the uppercase versions of the previous shorthands match just the opposite of the lowercase versions:

  • \D will match any character except a digit.
  • \W will match anything but an alphanumeric character (and underscore).
  • \S will match anything except a space, tab, newline or return.

👉 You Do - Define a RegEx Pattern (4 mins)

Based upon what you've learned so far...

  1. Write a regex pattern that will match:
  • The text of "File"
  • Followed by a space and two uppercase letters from the alphabet
  • Followed by a hyphen
  • Followed by three digits, except that the first of the three digits cannot be a zero
  1. Test that the following text would be a match:
    File XY-123

Possible Solution


File [A-Z][A-Z]-[1-9][0-9][0-9]

Note that there is no shortcut character class to match just a letter from the alphabet, so we must use:
[a-z] (lowercase),
[A-Z] (uppercase)
or [a-zA-Z] (upper and lowercase)


Quantifiers

Note how the solution for the previous exercise repeated the same character class in order to match it more than once.

Well, there's a better way using Quantifiers.

There are four different quantifiers:

  • {}
  • *
  • +
  • ?

Let's check them out...

The {num} or {min, max} Quantifier

Curly braces are used to specify a specific quantity, or range of quantities, to repeat the immediately preceding literal character, character class, etc.

For example, \d{3} would match three digits.

👉 You Do - Use the {} Quantifier (1 min)

  • Write the regex pattern that could be used to match a social security number (###-##-####)

Solution


\d{3}-\d\d-\d{4}

Without quantifiers, we would have to use this pattern:
\d\d\d-\d\d-\d\d\d\d



We can also specify a range like [A-Z]{1,5}, which would match between 1 and 5 capital letters.

A range from a number to infinity can be created by leaving off the second number such as this {5,}.

Note that regular expressions by default are "greedy", that is, they will match the most characters possible (longest possible match).

Quantifier Shortcut - *

The star/asterisk will match the preceding character zero or more times.

Quantifier Shortcut - +

The plus symbol will match the preceding character one or more times.

Quantifier Shortcut - ?

The question mark will match the preceding character zero or one time.

👉 You Do - Quantifiers (2 mins)

Identify the curly brace equivalents for each of the above Quantifier Shortcuts:

  1. *
  2. +
  3. ?

Solution


* -> {0,}
+ -> {1,}
? -> {0,1}


Escaping Special Characters

We've seen how certain characters such as /*+?.[]{} have special meaning in regular expressions.

That being the case, how do we literally match these special characters? For example, what if you wanted a pattern to match a number that includes a decimal point?

To accomplish this, you have to escape the special character by preceding it with a \ (backslash), for example, \+, would match the plus symbol.

Note that we do not have to escape special characters within a character class (square brackets). So, if you wanted to match a plus or minus sign, you could use the pattern of [+-].

👉 You Do - Escaping Special Characters (2 mins)

  1. Write the regular expression that would match a floating-point number with one or more digits on both sides of the decimal.

Solution


\d+\.\d+


  1. Write the regular expression that would match this text:
    What?

Solution


What\?


5. Regular Expressions in JavaScript

In JavaScript, Regular expressions are a built-in object subtype.

They can be created using a regular expression literal, or the RegExp class.

Let's use the console in Chrome's DevTools to check them out using JavaScript.

Regular Expression Literal Syntax

The literal syntax uses forward slashes to delimit the regex:

let re = /cats?/;

The literal syntax is the best option if you know the pattern you want to use in advance. However, using the class/constructor approach allows you to pass in a string variable to create a regex dynamically:

let pattern = "cats?";
let re = new RegExp(pattern);

A regex object has a test() method that returns true if there is at least one match:

let re = /cats?/;
re.test("fatcat"); // returns true

👉 You Do - JS RegExp (3 mins)

  1. In the console, create a JS regex object using literal syntax and assign it to a variable named re.

  2. The pattern should match a phone number with the following format:
    (###) ###-####

Hint, the parenthesis are special characters, so be sure to **_** them.

  1. Use re.test() to test a phone number.

Solution


let re = /\(\d{3}\) \d{3}-\d{4}/;
re.test("(123) 555-1212");

These docs discuss working with regular expressions in JavaScript, including the methods available on strings and regular expression objects...

JavaScript Methods Using Regular Expressions

MethodDescription
execA RegExp method that executes a search for a match in a string. It returns an array of information.
testA RegExp method that tests for a match in a string. It returns true or false.
matchA String method that executes a search for a match in a string. It returns an array of information or null on a mismatch.
searchA String method that tests for a match in a string. It returns the index of the match, or -1 if the search fails.
replace
replaceAll
A String method that executes a search for a match in a string, and replaces the matched substring with a replacement substring.
splitA String method that uses a regular expression or a fixed string to break a string into an array of substrings.

6. Regular Expression Playgrounds

The best way to continue to experiment and learn about more complex regular expressions is to use one of the several regex code playgrounds available:

7. Going Forward

We've had fun learning about the core features regular expressions.

Check out the Further Study and the Additional Practice Exercises sections if you wish to learn more.

You will surely cross paths with regular expressions during your career as a developer. And when you do, as usual, Google and documentation will be your friend.

8. Further Study

Alternation

  • Alternation allows us to easily search for one of several characters or words.

  • Let's say you want a single regex that will match any of these sentences:
    I have a dog.
    I have a cat.
    I have a bird.
    I have a fish.

  • This would do the trick
    /I have a (dog|cat|bird|fish)\./.

💪 Practice Exercise (5 mins)

  • Write a regex that would match a CSS color hexadecimal (3 or 6 characters), such as
    #f355Ac or #D39
Solution

/#([a-fA-F0-9]6|[a-fA-F0-9]3)/

Grouping

  • Parentheses are used inside regular expressions to create groups that can then have a quantifier applied to the group as a whole.

  • Whereas, the square brackets character class, [], represents a single character to match, the parentheses, (), represent a group of characters to match.

  • Let's say we wanted to match a computer's IP Address. Ignoring the fact that we should limit the numbers to between 0 and 255, we could write something like this:
    /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/

  • But using grouping we can shorten this to:
    /(\d{1,3}\.){3}\d{1,3}/

💪 Practice Exercise (2 mins)

  • Write a regular expression that would match this string:
    hey!hey!hey!
Solution

/(hey!)3/

Anchors and Boundaries

  • Anchors and boundaries are unique in that they don't match a character, instead they match a position.

  • They allow us to write patterns that match strings that contain only the characters we are interested in and only if they are isolated the way we want them to be.

  • The ^ symbol is used to match the start of the line. This is very useful for processing a file containing multiple lines.

  • The $ symbol matches the end of the line.

  • For example, without boundaries, the regex /dog/ will return true when tested against any of these strings: "dog", "dogs" and "My dog is named Spot". However, the regex /^dog$/ will match only the string "dog" and when there is no other text in the line.

  • Let's test the pattern, cat, with anchors (/^cat$/), and without (/cat/), against the strings "cat" and "catsup".

  • There is also \b, which matches a position called a
    word boundary. The \b will match any of the following:

    • Before the first character in the string.

    • After the last character in the string.

    • Between two characters in the string where one character is a word character and the other is a non-word character such as a space, tab, or newline.

  • The \b easily allows us to search for whole words only.

  • This is how we could use the string match() method to return the matches by passing in a regex:

// try with no word boundary
let re = /cat/g;
let matches = "The catsup was eaten by the cat".match(re);
// ["cat", "cat"]

// try using word boundary
let re = /\bcat\b/g;
let matches = "The catsup was eaten by the cat".match(re);
// ["cat"]

The g at the end of the regex is the global flag and it tells the regex to search for all matches, instead of just the first.

Capturing

  • Parentheses can also be used to define capture groups.

  • Capturing is when matched text is "captured" into numbered groups.

  • These groups can then be reused with a process called back-referencing.

  • Capturing is beyond the scope of this lesson. Here's one of several articles out there should the mood strike you.

9. Additional Practice

  • Now you can have some fun practicing writing four more regular expressions.

  • A possible solution follows each of the four exercises.

Additional Practice - 1 of 4

Match an American Express Credit Card Number which always begin with 34 or 37 and totals 15 digits.

Solution

/3[47]\d13/

Additional Practice - 2 of 4

Match a full U.S. Phone Number:
+1-(555)-555-5555

Solution

/+1-(\d3)-\d3-\d4/

Additional Practice - 3 of 4

A date in the format:
YYYY-MM-DD.
YYYY can start with either 19 or 20 only.
DD can be anything from 01 to 31, regardless of the month.

Solution

/(19|20)\d\d-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])/

Additional Practice - 4 of 4

An integer between 0 and 255
This is difficult, remember to use the "alternation" (|) operator.

Solution

/(2[0-4][0-9]|25[0-5]|[01]?[0-9]?[0-9])/

Helpful tool

Regexr is a helpful web based regex tool that can be used to write and test complex regex as well as troubleshoot regex that isn't working as expected.