<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>laneolson.ca &#187; regular expressions</title>
	<atom:link href="http://www.laneolson.ca/tag/regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.laneolson.ca</link>
	<description></description>
	<lastBuildDate>Fri, 07 Jan 2011 23:06:53 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Advanced Regular Expressions: Some Tools and an Example</title>
		<link>http://www.laneolson.ca/2010/02/09/advanced-regular-expressions-some-tools-and-an-example/</link>
		<comments>http://www.laneolson.ca/2010/02/09/advanced-regular-expressions-some-tools-and-an-example/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 20:49:47 +0000</pubDate>
		<dc:creator>Lane</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://www.laneolson.ca/?p=151</guid>
		<description><![CDATA[<p>I’ve found myself having to build a few somewhat complex (in my opinion) regular expressions over the last few days in order to index certain fields for Splunk.  A good friend of mine pointed me in the direction of a regular expression testing tool a while ago and it has proved to be extremely useful.  The tool, <a target="_blank" href="http://gskinner.com/RegExr/">RegExr</a>, gives a good overview of examples, special characters, and even community submitted regular expressions for you to use.  Most importantly it lets you test your regular expression on a sample of user submitted text.</p>
<p>This is a great tool for <a target="_blank" href="http://www.splunk.com/">Splunk</a>.  All you have to do is copy an event that you want to capture a custom field in, paste it in the tool, then work with the regular expression until it captures that data you need.  One example of a regular expression that I built is this monster:</p>

<div class="wp_syntax"><div class="code"><pre style="font-family: monospace;" class="bash"><span style="color: rgb(122, 8, 116); font-weight: bold;">(</span>http<span style="color: rgb(0, 0, 0); font-weight: bold;">&#124;</span>https<span style="color: rgb(122, 8, 116); font-weight: bold;">)</span>:<span style="color: rgb(0, 0, 0); font-weight: bold;">//</span><span style="color: rgb(122, 8, 116); font-weight: bold;">(</span><span style="color: rgb(122, 8, 116); font-weight: bold;">(</span><span style="color: rgb(122, 8, 116); font-weight: bold;">[</span>A-Za-z0-<span style="color: rgb(0, 0, 0);">9</span>\.\-<span style="color: rgb(122, 8, 116); font-weight: bold;">]</span><span style="color: rgb(0, 0, 0); font-weight: bold;">*</span><span style="color: rgb(122, 8, 116); font-weight: bold;">)</span>?\.<span style="color: rgb(122, 8, 116); font-weight: bold;">)</span>?<span style="color: rgb(122, 8, 116); font-weight: bold;">(</span>?<span style="color: rgb(0, 0, 0); font-weight: bold;">&#60;</span>domain_name<span style="color: rgb(0, 0, 0); font-weight: bold;">&#62;</span><span style="color: rgb(122, 8, 116); font-weight: bold;">[</span>A-Za-z0-<span style="color: rgb(0, 0, 0);">9</span>\-<span style="color: rgb(122, 8, 116); font-weight: bold;">]</span><span style="color: rgb(122, 8, 116); font-weight: bold;">{</span><span style="color: rgb(0, 0, 0);">3</span><span style="color: rgb(122, 8, 116); font-weight: bold;">}</span><span style="color: rgb(122, 8, 116); font-weight: bold;">[</span>A-Za-z0-<span style="color: rgb(0, 0, 0);">9</span>\-<span style="color: rgb(122, 8, 116); font-weight: bold;">]</span><span style="color: rgb(0, 0, 0); font-weight: bold;">*</span>\.<span style="color: rgb(122, 8, 116); font-weight: bold;">(</span><span style="color: rgb(122, 8, 116); font-weight: bold;">[</span>A-Za-z<span style="color: rgb(122, 8, 116); font-weight: bold;">]</span><span style="color: rgb(122, 8, 116); font-weight: bold;">{</span><span style="color: rgb(0, 0, 0);">2</span><span style="color: rgb(122, 8, 116); font-weight: bold;">}</span>\.<span style="color: rgb(122, 8, 116); font-weight: bold;">[</span>A-Za-z<span style="color: rgb(122, 8, 116); font-weight: bold;">]</span><span style="color: rgb(122, 8, 116); font-weight: bold;">{</span><span style="color: rgb(0, 0, 0);">2</span><span style="color: rgb(122, 8, 116); font-weight: bold;">}</span><span style="color: rgb(0, 0, 0); font-weight: bold;">&#124;</span><span style="color: rgb(122, 8, 116); font-weight: bold;">[</span>A-Za-z<span style="color: rgb(122, 8, 116); font-weight: bold;">]</span><span style="color: rgb(122, 8, 116); font-weight: bold;">{</span><span style="color: rgb(0, 0, 0);">2</span>,<span style="color: rgb(0, 0, 0);">3</span><span style="color: rgb(122, 8, 116); font-weight: bold;">}</span><span style="color: rgb(122, 8, 116); font-weight: bold;">)</span><span style="color: rgb(0, 0, 0); font-weight: bold;">&#124;</span><span style="color: rgb(122, 8, 116); font-weight: bold;">[</span><span style="color: rgb(0, 0, 0);">0</span>-<span style="color: rgb(0, 0, 0);">9</span><span style="color: rgb(122, 8, 116); font-weight: bold;">]</span><span style="color: rgb(122, 8, 116); font-weight: bold;">{</span><span style="color: rgb(0, 0, 0);">1</span>,<span style="color: rgb(0, 0, 0);">3</span><span style="color: rgb(122, 8, 116); font-weight: bold;">}</span>\.<span style="color: rgb(122, 8, 116); font-weight: bold;">[</span><span style="color: rgb(0, 0, 0);">0</span>-<span style="color: rgb(0, 0, 0);">9</span><span style="color: rgb(122, 8, 116); font-weight: bold;">]</span><span style="color: rgb(122, 8, 116); font-weight: bold;">{</span><span style="color: rgb(0, 0, 0);">1</span>,<span style="color: rgb(0, 0, 0);">3</span><span style="color: rgb(122, 8, 116); font-weight: bold;">}</span>\.<span style="color: rgb(122, 8, 116); font-weight: bold;">[</span><span style="color: rgb(0, 0, 0);">0</span>-<span style="color: rgb(0, 0, 0);">9</span><span style="color: rgb(122, 8, 116); font-weight: bold;">]</span><span style="color: rgb(122, 8, 116); font-weight: bold;">{</span><span style="color: rgb(0, 0, 0);">1</span>,<span style="color: rgb(0, 0, 0);">3</span><span style="color: rgb(122, 8, 116); font-weight: bold;">}</span>\.<span style="color: rgb(122, 8, 116); font-weight: bold;">[</span><span style="color: rgb(0, 0, 0);">0</span>-<span style="color: rgb(0, 0, 0);">9</span><span style="color: rgb(122, 8, 116); font-weight: bold;">]</span><span style="color: rgb(122, 8, 116); font-weight: bold;">{</span><span style="color: rgb(0, 0, 0);">1</span>,<span style="color: rgb(0, 0, 0);">3</span><span style="color: rgb(122, 8, 116); font-weight: bold;">}</span><span style="color: rgb(122, 8, 116); font-weight: bold;">)</span></pre></div></div>

<p>I’ll explain what this one does in a second, but you can probably guess by looking at it.  With Splunk I am indexing all data that goes through the HTTP proxy on the firewall.  Each event that Splunk indexes from the proxy includes the address processessed, right down to the file name.  However, I’m more interested in pooling all of the events by domain name to get total amount of requests, time spent, etc by domain name.  So, I needed a regular expression to extract the domain name; enter the mess of characters from above...</p>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve found myself having to build a few somewhat complex (in my opinion) regular expressions over the last few days in order to index certain fields for Splunk.  A good friend of mine pointed me in the direction of a regular expression testing tool a while ago and it has proved to be extremely useful.  The tool, <a href="http://gskinner.com/RegExr/" target="_blank">RegExr</a>, gives a good overview of examples, special characters, and even community submitted regular expressions for you to use.  Most importantly it lets you test your regular expression on a sample of user submitted text.</p>
<p>This is a great tool for <a href="http://www.splunk.com/" target="_blank">Splunk</a>.  All you have to do is copy an event that you want to capture a custom field in, paste it in the tool, then work with the regular expression until it captures that data you need.  One example of a regular expression that I built is this monster:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">&#40;</span>http<span style="color: #000000; font-weight: bold;">|</span>https<span style="color: #7a0874; font-weight: bold;">&#41;</span>:<span style="color: #000000; font-weight: bold;">//</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\.\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #000000; font-weight: bold;">*</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>?\.<span style="color: #7a0874; font-weight: bold;">&#41;</span>?<span style="color: #7a0874; font-weight: bold;">&#40;</span>?<span style="color: #000000; font-weight: bold;">&lt;</span>domain_name<span style="color: #000000; font-weight: bold;">&gt;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #000000; font-weight: bold;">*</span>\.<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #000000; font-weight: bold;">|</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #000000; font-weight: bold;">|</span><span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span></pre></div></div>

<p>I&#8217;ll explain what this one does in a second, but you can probably guess just by looking at it.  With Splunk I am indexing all data that goes through the HTTP proxy on the firewall.  Each event that Splunk indexes from the proxy includes the address processessed, right down to the file name.  However, I&#8217;m more interested in pooling all of the events by domain name to get total amount of requests, time spent, etc by domain name.  So, I needed a regular expression to extract the domain name; enter the mess of characters from above.</p>
<p>The regular expression above will break down a URL and capture its domain name.  For example the regex will capture example.com from the following url:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">http:<span style="color: #000000; font-weight: bold;">//</span>www.example.com<span style="color: #000000; font-weight: bold;">/</span>some_folder<span style="color: #000000; font-weight: bold;">/</span>filename.php?<span style="color: #007800;"><span style="color: #c20cb9; font-weight: bold;">id</span></span>=<span style="color: #000000;">34</span><span style="color: #000000; font-weight: bold;">&amp;</span><span style="color: #007800;">name</span>=something</pre></div></div>

<p>This is actually simple enough to capture but became more complex when you considered the following:
<ul>
<li>Subdomains (whatever.example.com)</li>
<li>Odd domain extensions (ab.ca, co.uk)</li>
<li>IP Address domains (64.75.34.12)</li>
</ul>
<p>Here is a break down of the regular expression mentioned above.</p>
<h4>Capture group 1:</h4>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">&#40;</span>http<span style="color: #000000; font-weight: bold;">|</span>https<span style="color: #7a0874; font-weight: bold;">&#41;</span>:<span style="color: #000000; font-weight: bold;">//</span></pre></div></div>

<p>Pretty straight forward, the url must begin with http or https.  This is a HTTP/S proxy so I know that it will begin with either of these two values.  If you were to use this for an FTP proxy you could easily put in ftp.  The http/s is always followed by ://.</p>
<h4>Capture Group 2:</h4>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\.\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #000000; font-weight: bold;">*</span>\.<span style="color: #7a0874; font-weight: bold;">&#41;</span>?</pre></div></div>

<p>This part of the expression is to capture the subdomain.  It looks for any number containing characters in the range A-Z, a-z, 0-9, ., and – followed by a . (dot).  The question mark at the end of this part of the expression means that it is optional, that is, not all URL&#8217;s have subdomains.  Now that we have the subdomain, the next thing to process will be the domain name.</p>
<h4>Capture Group 3 (this one&#8217;s a doosey):</h4>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #000000; font-weight: bold;">*</span>\.<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #000000; font-weight: bold;">|</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span>2s,<span style="color: #000000;">4</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #000000; font-weight: bold;">|</span><span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span></pre></div></div>

<p>This one is so long because it looks for a domain name, or an IP address.  Here is the part that captures a domain name:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #000000; font-weight: bold;">*</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #000000; font-weight: bold;">|</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span>,<span style="color: #000000;">6</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span></pre></div></div>

<p>A domain name must be at lease 2 characters, the first part takes care of that: [A-Za-z0-9\-]{2}.</p>
<p>The expression that follows that captures the remaining characters in the domain up until the “.” before the domain extension:  [A-Za-z0-9\-]*\.</p>
<p>The final part captures the domain extension, which can sometimes be a provice/state followed by a country code (ab.ca or fl.us which is ([A-Za-z]{2}\.[A-Za-z]{2}) or (|) an extension from two characters (ie: .ca) to six characters (ie: .museum) which is represented by [A-Za-z]{2,6}.</p>
<p>I also wanted to capture the IP address if the http request used an IP instead of a host name.  That&#8217;s what the last part captures:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span></pre></div></div>

<p>This one is fairly easy as well.  An Ipv4 address is just a series of 1 to 3 digits, a dot, 1 to 3 digits, a dot, 1 to 3 digits, a dot, then 1 to 3 digits.  That is what the above captures.</p>
<p>To reiterate, the full regular expression is:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">&#40;</span>http<span style="color: #000000; font-weight: bold;">|</span>https<span style="color: #7a0874; font-weight: bold;">&#41;</span>:<span style="color: #000000; font-weight: bold;">//</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\.\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #000000; font-weight: bold;">*</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>?\.<span style="color: #7a0874; font-weight: bold;">&#41;</span>?<span style="color: #7a0874; font-weight: bold;">&#40;</span>?<span style="color: #000000; font-weight: bold;">&lt;</span>domain_name<span style="color: #000000; font-weight: bold;">&gt;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z0-<span style="color: #000000;">9</span>\-<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #000000; font-weight: bold;">*</span>\.<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #000000; font-weight: bold;">|</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>A-Za-z<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">2</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #000000; font-weight: bold;">|</span><span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span>\.<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span>-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #000000;">1</span>,<span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#125;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span></pre></div></div>

<p>You may be wondering what the ?<domain_name> is for.  That is just a way to assign the value of the 4th capture group a variable named “domain_name” in Splunk.  If you take out the ?<domain_name> capture group 4 will still contain the domain name.</p>
<p>I feel that learning and becoming comfortable with regular expressions is extremely important if you do any kind of programming.  At first they may seem a little daunting, but once you get the patterns down they are actually quite simple to write.  They allow you to parse almost any kind of data and nearly every programming language has some kind of implementation for them.  Using a tool like <a href="http://gskinner.com/RegExr/" target="_blank">RegExr</a> is a great way to learn to write regular expressions and also test them out once you get the hang of it.  You can also find a large library of regular expressions at <a href="http://regexlib.com/" target="_blank">regexlib.com</a>.</p>
<p><strong>Regular Expression Links:</strong></p>
<ul>
<li><a href="http://gskinner.com/RegExr/" target="_blank">RegExr</a> &#8211; Regular Expression Testing Tool</li>
<li><a href="http://regexlib.com/" target="_blank">RegExLib.com</a> &#8211; A collection of regular expressions</li>
<li><a href="http://www.aivosto.com/vbtips/regex.html" target="_blank">An Introduction to Regular Expressions</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.laneolson.ca/2010/02/09/advanced-regular-expressions-some-tools-and-an-example/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

