Home > Uncategorized > Using Regex to Sanitize and Standardize Input Fields

Using Regex to Sanitize and Standardize Input Fields

We’ve all seen forms in web applications which require a phone number. The challenge posed by this field is there are many different ways for a user to type a valid phone number. Some examples are:

  • 602-236-2619
  • (602)2368888
  • 602.236.8888
  • 602 236 8888
  • 6022368888

The problem is we do not want the application storing the phone number in multiple formats. We want the system to use a single format. Here are some of the strategies I’ve seen to address the problem:

  • Use 3 input fields.
    Instead of a single text field, use three. Each field is meant for a different part of the phone number. Some implementations of this idea automatically tab to the next field after you enter the correct number of digits in the current field. Others force the user to tab between fields.
  • Adjust the phone number as the user types.
    This strategy uses a single text field. To match the desired format, the application modifies the user’s input as they type in order to insert the dash or parenthesis in the correct place.
  • Hope for the best.
    This strategy using a single text field and hopes the user enters the phone number matching the format show next to the text field, such as (XXX) XXX-XXXX. The validation on the server side checks for this format and displays an error to the user if it does not match, even if it is a valid phone number.

All of these techniques can lead to a sub-optimal user experience and unhappy users.

So what is a better solution to the problem? A better approach is to accept the format the user gives and change it to your internal format on the server side if it is a valid phone number.

The following code exemplifies the idea above. It is for a 10 digit US phone number. First we use a regular expression to remove all the ‘filler’ characters often found in phone numbers. Then we check to see if we have 10 digits. Next, we use another regular express to extract the logical groups of numbers found in a 10 digit phone number, for instance the area code is the first 3 digits. Finally, we take those logical groups and drop them into the corresponding slots of our format string (the value of the desiredFormat variable shown below) to transform the phone number into our system’s format.

package com.hrycan.blog;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PhoneFormatter {
	private String parsePhoneNumberRegex = "(\\d{3})(\\d{3})(\\d{4})";
	private String desiredFormat = "($1) $2-$3";

	private Pattern pattern = Pattern.compile(parsePhoneNumberRegex);

	/**
	 * Formats the supplied phone number into the desired format if
	 * it is a recognized 10-digit phone number, otherwise returns null.
	 * @param phoneNumber	phone number to be transformed
	 * @return				correctly formatted phone number, 
	 * 						otherwise null
	 */
	public String format(String phoneNumber) {
		String formattedNumber = null;

		if (phoneNumber != null) {
			//remove any non-digits
			String bareNumber = phoneNumber.replaceAll("[\\s.()-]", "");
			if (bareNumber.matches("\\d{10}")) {
				Matcher matcher = pattern.matcher(bareNumber);
				if (matcher.matches()) {
					//place the groups matched into the slots
					formattedNumber = matcher.replaceAll(desiredFormat);
				}
			}
		}

		return formattedNumber;
	}
}

Below is a JUnit 4.4 test showing the code in action.

package com.hrycan.blog;

import org.junit.Test;
import static org.junit.Assert.assertNotNull;
import static org.junit.Assert.assertNull;

public class PhoneFormatterTest {

	@Test
	public void testFormat() {
		
		PhoneFormatter formatter = new PhoneFormatter();
		String[] good = {"602-437-1616", 
				"(602)4371616",
				"602.437.1616",
				"602 437 1616",
				"6024371616"};
		
		String[] bad = {"4371616",
				"1.602.437.1616",
				"437-1616",
				"",
				null};
		
		for (String input : good) {
			String output = formatter.format(input);
			assertNotNull(input, output);
		}
		
		for (String input : bad) {
			String output = formatter.format(input);
			assertNull(input, output);
		}
	}
}

Of course, you may need to accept more than 10 digits in a phone number, you may want to accept an alphanumeric phone number, or you may want to use a different phone number format in your system. The same concepts demonstrated can fit those situations. For example, if you wanted the phone number format to become XXX-XXX-XXXX, you would change the value of the desiredFormat variable to:

private String desiredFormat = "$1-$2-$3";

If you want to make this into a generic utility, one of the things you would want to do is pass your ‘desiredFormat’ string to an overloaded format method.

This concept of sanitizing user supplied data and standardizing it to a common format is not limited to phone numbers. It can apply to any input field where there are multiple valid input formats, such as credit card numbers and social security numbers.

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: