A Practical Unix/Terminal Example From Real Life

I’ve been working off and on since 1986 in Unix.  The latest installment of Taming the Terminal (part 16 of N) on the Nosillacast podcast and taught by Bart Busschots has been a lot of fun.  The real power of Unix and its cousins is what Bart mentioned:

  1. Doing one thing and doing it well.
  2. Plumbing. In honor of this I wanted to share something I used my Mac to help me do today at work.  I am a C#/.NET programmer and as such the only platform that has the best tools is Windows. <sigh>  But, I have a Mac Mini on my desk and it can run circles around my new WinPC, but I digress.

We use a markup language called XML to communicate between services.  Part of a file looks like this:

<Event>93406SAVANNAH GHOST TOUR2014-02-15 
19:00:002014-02-15 20:30:00122013-12-11 
00:00:002014-02-15 19:00:005813626260NO</Event>
<EventID>93406</EventID>
<EventName>SAVANNAH GHOST TOUR</EventName>
<StartDateTime>2014-02-15 19:00:00</StartDateTime>
<EndDateTime>2014-02-15 20:30:00</EndDateTime>
<EventTypeID>12</EventTypeID>
<OnSaleDateTime>2013-12-11 00:00:00</OnSaleDateTime>
<OffSaleDateTime>2014-02-15 19:00:00</OffSaleDateTime>
<ResourceID>58</ResourceID>
<UserEventNumber>136</UserEventNumber>
<Available>26</Available>
<TotalCapacity>26</TotalCapacity>
<Status>0</Status>
<HasRoster>NO</HasRoster>
I know, it looks like high tech chicken scratch, right?

The problem I had to solve was to get a list of ONLY EventID numbers.  This list had to have unique entries and be sorted with nothing but the number. How to do it?

I don’t recall if Bart and Allison discussed the grep command.  It is a ridiculously powerful command. Oversimplified it will pick out lines from an input, like a file.  I have such a file called events.txt that is 3006 lines long.  How do I know?

cat events.txt | wc -l
Now the information I need is on the lines that have <EventID> so I’ll use grep to see just those lines.
cat events.txt | grep -i <eventid>
The -i tells grep to ignore case so I can be lazy and type my search term in lower case.  That makes this command, in English,

“Send the contents of the file events.txt to the grep command looking for <eventid>, ignore case when looking and send the results to standard out, which in our case is the screen.”

This produces output like:

-bash: syntax error near unexpected token `newline'
Uh-oh.  What happened?  See the string I’m looking for <eventid> – well, that contains a ‘<‘ and a ‘>’ symbol which have special meaning to terminal.  Remember < means “read from” and > means “send to”.  Let’s take care of that with escaping them.  Typing this:
cat events.txt | grep -i \<eventid\>
solves the problem and tells terminal the angle brackets are just characters – nothing special to see here, move along.  Now my output is 214 lines like this:
 <EventID>94017</EventID>
 <EventID>94822</EventID>
 <EventID>95200</EventID>
so far, so good.  What’s next?  Well, remember I don’t want the <EventID> or </EventID>, just the number.  There is another useful command named cut that can help.  You can use   man cut   to see more about it but in short if I say:
cut -c 10-14
it will “cut out” from the 10th to the 14th character which in my case is where the number is on the lines remaining after grep gets done with them.  I’m fortunate in that the numbers are all the same size and would have to get creative if they could be different lengths.  Oh, let not your heart be troubled – there IS a way to do it but it’s more complicated than I care to get into here.

So, adding that to our pipeline we get:

cat events.txt | grep -i \<eventId\> | cut -c 10-14
and now we have an output of 214 lines like this:
94867
95363
Almost there!  Now we have to sort them and remove duplicates.  What do you think the command to sort is?  Yes!  It’s sort!  You can use the plumbing pipe (|) to send all this through the sort command so now we are up to:
cat events.txt | grep -i \<eventId\> | cut -c 10-14 | sort
This will give us a sorted list but what about the duplicates? Well a quick  man sort  will let you see that sort can report only unique lines with the -u flag.  Adding that we get:
cat events.txt | grep -i \<eventId\> | cut -c 10-14 | sort -u
There we have it!  A list of unique entries of only the number desired and sorted. Ok, that’s great but let’s write all this to a file so we can have it for later and not have to run this command over and over:
cat events.txt | grep -i \<eventId\> | cut -c 10-14 | sort -u 
> out.txt
And that, my friends, is how you can use Unix plumbing to get what you want from what you have.

I hope this was interesting and hopefully helpful to someone – if only to get you to try some stuff on your own.  Get yourself a test file and call it something unlike everything else so there are no mistakes and start playing.  Be creative and use man to guide you – almost every command you can use is documented well there.  Be curious!  If there is a cut command, wonder if there is a paste?  (go ahead, go look.)

Bonus tip: There is a variant of man you can use by either typing

man -k (some term)
or
apropos (some term)
and it will find anything like the term.  For instance, if you do
apropos file
you will get man pages for tempfile, cat, cut, grep, mv and a ton more commands that relate to files.

Go crazy, have fun.  If you are afraid remember you have a backup!  You do have a backup don’t you?  Read the man page since it is clear about what is going to happen and it’s hard to use “normal” commands to hurt your computer.

Leave a Reply