Text processing with Awk

  11 Oct 2015

One day, I got the file contains the information looks like the following (file was named as useragent-list):

$< useragent-list

john.doe@awesomeemail.com -  -
ABCNode/0.1.4 ABCAPI/2014-07-27 Node/2.2.2
ABCNode/0.1.5 ABCAPI/2014-07-27 Node/2.2.1
john.smith@coolnewemail.com -  -
hi@hihisomething.me - www.hihisomething.me - live
ABCNode/0.1.4 ABCAPI/2014-07-27 Node/2.2.2
ABCNode/0.1.5 ABCAPI/2014-07-27 Node/2.2.2
nai.kor@gmail.com -  -
ABCMagento/0.1.0 ABCPHP/2.2.0 ABCAPI/2014-07-27
ABCMagento/ ABCPHP/2.3.1 ABCAPI/2014-07-27
ABCOpenCart/1.0 ABCPHP/2.3.1 ABCAPI/2014-07-27
ABCPHP/2.2.0 ABCAPI/2014-07-27
ABCWooCommerce/1.0.2 WooCommerce/2.3.10 Wordpress/4.2.2
ABCOpenCart/1.0 ABCPHP/2.1.2 ABCAPI/2014-07-27
superawesome@dogerdoger.io -  -
ABCPHP/2.1.2 ABCAPI/2014-07-27
helloworld123@gmail.com - www.facebook.com - live
ABCPHP/2.2.0 ABCAPI/2014-07-27
ABCNode/0.1.5 ABCAPI/2014-07-27 Node/2.2.1

The sample file and output here is truncated, It contains --------------------- to separate block or record that each block is per one user who have requested with useragent name. What I have to do is to search for any useragent name and only live user email, then notify them via email for updating the relevant and proper version of the agent. For example, I need to look for ABCPHP useragent and get the live users email list and email them. But the problem was the original file was big enough that I thought I should not have filtered and created report manually.

I always use awk to deal with simple line oriented data like awk '{print $2}' file

So, I thought to myself, this requirement could lead me to learn some more of awk. So, why not?

Awk is an excellent filter, manipulating text and report writer. You can definitely do amazing things with rows and columns data on *Nix based system with awk. I recommended you to read http://www.grymoire.com/Unix/Awk.html if you don’t have any awk basics. But for summary, awk recognizes ( by default ) each line in a file as a record and it operates on a record at a time. A record consists of fields ( use spaces or tabs as delimiter by default ), you can access with field by $1, $2 …$n ( $0 is a record ).

After trial and error, Here is my pattern scanning and processing for the file, only 7 lines of code ( It could be more if I have had written in other languages )

After I executed the code

$awk -v search=ABCPHP 'BEGIN { print "Search: '"search"'"; RS="\n---------------------\n"; FS="\n"; }
     for (i=2; i <= NF; i++)
         if (match($1, /live$/) && match($i, search))
             split($1, a, " "); users[a[1]]=a[3];
END { if (length(users) > 0) for (u in users) print u, users[u] }' useragent-list

The result would look like this

Search: 'ABCPHP'

helloworld123@gmail.com www.facebook.com

However, the result in the example here is only 1 email. In reality it could be thousand, so, now I can use this simple awk script to filter and get the list of email easily.

comments powered by Disqus