{"id":401,"date":"2009-08-04T15:00:51","date_gmt":"2009-08-04T22:00:51","guid":{"rendered":"http:\/\/www.stuartsheldon.org\/blog\/?p=401"},"modified":"2009-08-04T15:00:51","modified_gmt":"2009-08-04T22:00:51","slug":"using-regular-expressions-on-the-linux-command-line","status":"publish","type":"post","link":"https:\/\/www.stuartsheldon.org\/blog\/2009\/08\/using-regular-expressions-on-the-linux-command-line\/","title":{"rendered":"Using Regular Expressions On The Linux Command Line"},"content":{"rendered":"<h2><em><strong>Using Regular Expressions (RegEx) on the command line.<\/strong><\/em><\/h2>\n<p>Questions about regular expressions come up at the Lug meetings on a regular basis. Here are some examples of regex commands I use all the time. Hope you find them useful.<\/p>\n<p style=\"text-align: right;\"><!--more--><\/p>\n<h2><strong><em>Parse a file skipping commented lines.<\/em><\/strong><\/h2>\n<pre>gateway:~# egrep '^[^#]' \/etc\/manpath.config\r\nMANDATORY_MANPATH\t\t\t\/usr\/man\r\nMANDATORY_MANPATH\t\t\t\/usr\/share\/man\r\nMANDATORY_MANPATH\t\t\t\/usr\/local\/share\/man\r\n[...] more lines\r\n\r\n#### Let's use to wordcount to count the lines returned:\r\n\r\ngateway:~# egrep '^[^#]' \/etc\/manpath.config | wc -l\r\n23\r\n\r\ngateway:~# cat \/etc\/manpath.config | wc -l\r\n114<\/pre>\n<p>So let&#8217;s look at the command. The egrep command is just grep that allows regex searches. So it will search through a file and when it finds a match to the regex, it will return the entire line that contains that match. It returns nothing if there is no match on that line. Our regex is <em><strong>&#8216;^[^#]&#8217;<\/strong><\/em> , which reads like this:<\/p>\n<ul>\n<li>The first <em><strong>&#8220;^&#8221;<\/strong><\/em> means the rest of the regex must begin at the start of the line.<\/li>\n<li>Any single character enclosed in <em><strong>&#8220;[]&#8221;<\/strong><\/em> will match.<\/li>\n<li>unless the stuff in the brackets start with <em><strong>&#8220;^&#8221;<\/strong><\/em> which means the reverse, or everything <strong>but<\/strong> the stuff in the brackets will match.<\/li>\n<li>So,<em><strong> &#8220;^[^#]&#8221;<\/strong><\/em> means every line that doesn&#8217;t start with <em><strong>&#8220;#&#8221;<\/strong><\/em> will match.<\/li>\n<\/ul>\n<p>The wc command is just a utility that will count words or lines.<\/p>\n<h2><em><strong>Using egrep to find files that contain my home directory.<\/strong><\/em><\/h2>\n<pre>stu@linus:~$ egrep -l '\\\/h\\\/stu' *\r\nHostingContract.ott\r\nScale6Vlan-Master0.odb\r\n\r\n### Notice we had to escape the forward slashes<\/pre>\n<p>We are again using egrep, but we are changing what it returns when it finds a match by adding the <strong><em>&#8220;-l&#8221;<\/em><\/strong> switch. This switch causes egrep to return the filename of the files that contain the regex. My home directory on the system I ran the command on is <strong><em>&#8216;\/h\/stu&#8217;<\/em><\/strong>. We need to <strong><em>&#8220;escape&#8221;<\/em><\/strong> the <em><strong>&#8220;\/&#8221;<\/strong><\/em>s in order to have egrep ignore the special meaning of the <em><strong>&#8220;\/&#8221;<\/strong><\/em> character.<\/p>\n<h2><em>Using egrep and sed to find all routes to a class C address.<\/em><\/h2>\n<pre>### The output we are modifying:\r\n209.84.9.248\/29 via 209.84.10.2 dev eth0  proto zebra  metric 20\r\n209.84.9.240\/29 via 209.84.10.2 dev eth0  proto zebra  metric 20\r\n209.84.9.160\/27 dev eth4  proto kernel  scope link  src 209.84.9.161\r\n209.84.9.0\/26 dev eth1  proto kernel  scope link  src 209.84.9.1\r\n209.84.9.64\/26 dev eth3  proto kernel  scope link  src 209.84.9.65 \r\n\r\n### The results\r\n\r\nborder2:~# ip route | egrep '209\\.84\\.9' \\\r\n        | sed 's\/^\\(209\\.84\\.9\\.[0-9]\\{1,3\\}\\\/[0-9]\\{1,2\\}\\).*$\/\\1\/'\r\n209.84.9.248\/29\r\n209.84.9.240\/29\r\n209.84.9.160\/27\r\n209.84.9.0\/26\r\n209.84.9.64\/26\r\n\r\n### Now let's do the same thing, but with just sed\r\n\r\nborder2:~# ip route \\\r\n        | sed -e 's\/^\\(209\\.84\\.9\\.[0-9]\\{1,3\\}\\\/[0-9]\\{1,2\\}\\).*$\/\\1\/' \\\r\n        -e '\/^209\\.84\\.9\/!d'\r\n209.84.9.248\/29\r\n209.84.9.240\/29\r\n209.84.9.160\/27\r\n209.84.9.0\/26\r\n209.84.9.64\/26<\/pre>\n<p>Ok, let&#8217;s start with the egrep command. As you can see, we are using the escape character again. This time, we are escaping the <strong><em>&#8220;.&#8221;<\/em><\/strong>. We need to do this because <em><strong>&#8220;.&#8221;<\/strong><\/em> has a special meaning. It means 0 or more of any character. So, if we want to search for an actual period, we need to escape it. The <strong><em>&#8220;.&#8221;<\/em><\/strong> meta character is considered <em><strong>greedy<\/strong><\/em> because it will match anything, including nothing.<\/p>\n<p>Now, the sed command I&#8217;m using is a but more complicated, but once you understand what I&#8217;m doing, it should become clear. Here is an overview of the meta characters and modifiers I&#8217;m using.<\/p>\n<p>I am constructing a replacement string denoted by the <em><strong>&#8220;s&#8221;<\/strong><\/em> at the beginning of the expression. s\/&lt;match data&gt;\/&lt;replace with&gt;\/<\/p>\n<p><em><strong>&#8220;^&#8221;<\/strong><\/em> means start the match at the beginning of the line.<\/p>\n<p>Everything enclosed in the <em><strong>\\( stuff to match \\)<\/strong><\/em> is saved so you can reuse it later with <em><strong>&#8220;\\1&#8221;<\/strong><\/em>.<\/p>\n<p>The meaning of <em><strong>&#8220;209\\.84\\.9\\.&#8221;<\/strong><\/em> is nothing more then the first part of the network address I want to display when I&#8217;m done.<\/p>\n<p>This however: <em><strong>&#8220;[0-9]\\{1,3\\}\\\/[0-9]\\{1,2\\}&#8221;<\/strong><\/em> might require a little thought to understand. Let&#8217;s start with what we already know: <strong><em>&#8220;[0-9]&#8221;<\/em><\/strong> is a number between 0-9. That&#8217;s simple, but what about <strong><em>&#8220;\\{1,3\\}&#8221;<\/em><\/strong>? Well, that means that I want the 0-9 number to occur at least once, but no more then 3 times before the next character to match occurs, which is <strong><em>&#8220;\\\/&#8221;<\/em><\/strong> which is actually a <em><strong>&#8220;\/&#8221;<\/strong><\/em>. And, after that, <em><strong>&#8220;[0-9]\\{1,2\\}&#8221;<\/strong><\/em> which we explained above.<\/p>\n<p>That completes all the stuff we want to keep for later, so the next part of the string is <em><strong>&#8220;\\)&#8221;<\/strong><\/em>. This is followed by a really greedy expression: <em><strong>&#8220;.*&#8221;<\/strong><\/em>. We talked about the <strong>&#8220;.&#8221;<\/strong> earlier, but the <em><strong>&#8220;*&#8221;<\/strong><\/em> is new. The <em><strong>&#8220;*&#8221;<\/strong><\/em> means <strong><em>o or more of the previous character<\/em><\/strong>. and since the previous character is a <em><strong>&#8220;.&#8221;<\/strong><\/em>, that means anything or nothing.<\/p>\n<p>We finish the match portion up with a <em><strong>&#8220;$&#8221;<\/strong><\/em> which reminds you to send me a check. Not really, just seeing if you are awake, it means the end of the line.<\/p>\n<p>So let&#8217;s try writing the match portion of it in simple terms:<\/p>\n<p><em><strong>start of line<\/strong><\/em><strong> &#8211; 209.84.9. &#8211; <em>one to three numbers<\/em> \/ &#8211; <em>one to two numbers<\/em> &#8211; <em>anything at all to end of line<\/em><\/strong><\/p>\n<p>Now that we got through that, the last part is what we are going to substitute it with. All I want is the network part of the line which is in the <em><strong>&#8220;\\(\\)&#8221;<\/strong><\/em> brackets, and I can access that using the <em><strong>&#8220;\\1&#8221;<\/strong><\/em> macro. Which is what I have done above.<\/p>\n<p>Last but not least, I used an example above that is completely done in sed. The regex\u00a0 <em><strong>&#8220;\/209\\.84\\.9\\.\/!d&#8221;<\/strong><\/em> was added to the sed command to look for the info we wanted. Usually a <em><strong>&#8220;d&#8221;<\/strong><\/em> would delete the entire line of a match, but adding the <em><strong>&#8220;!&#8221;<\/strong><\/em> causes it only to delete lines that <strong>do not<\/strong> match.<\/p>\n<p>I hope this sparks some interest in regular expressions. For a Unix administrator, the regular expression is a life saver. Here are a couple books that will help you get started: <a href=\"http:\/\/oreilly.com\/catalog\/9780596528126\/\" target=\"_blank\">Mastering Regular Expressions, <span>By Jeffrey E. F. Friedl<\/span><\/a>.<\/p>\n<p>&#8212; Stu<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Using Regular Expressions (RegEx) on the command line. Questions about regular expressions come up at the Lug meetings on a regular basis. Here are some examples of regex commands I use all the time. Hope you find them useful.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,4],"tags":[85,82,83,191,79,80,81,84],"class_list":["post-401","post","type-post","status-publish","format-standard","hentry","category-free-bsd","category-linux","tag-bsd","tag-egrep","tag-grep","tag-linux","tag-regex","tag-regular-expressions","tag-sed","tag-unix"],"_links":{"self":[{"href":"https:\/\/www.stuartsheldon.org\/blog\/wp-json\/wp\/v2\/posts\/401","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.stuartsheldon.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.stuartsheldon.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.stuartsheldon.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.stuartsheldon.org\/blog\/wp-json\/wp\/v2\/comments?post=401"}],"version-history":[{"count":34,"href":"https:\/\/www.stuartsheldon.org\/blog\/wp-json\/wp\/v2\/posts\/401\/revisions"}],"predecessor-version":[{"id":435,"href":"https:\/\/www.stuartsheldon.org\/blog\/wp-json\/wp\/v2\/posts\/401\/revisions\/435"}],"wp:attachment":[{"href":"https:\/\/www.stuartsheldon.org\/blog\/wp-json\/wp\/v2\/media?parent=401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.stuartsheldon.org\/blog\/wp-json\/wp\/v2\/categories?post=401"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.stuartsheldon.org\/blog\/wp-json\/wp\/v2\/tags?post=401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}