AWK Help

Awk Help

Do not meddle in the affairs of awk, for it is subtle and quick to anger.

Immensely useful c-like pattern matching program, a pre-cursor of PERL. Used a lot in shell scripts (or as shell scripts). Only use awk if you can't think of another way to do it. awk is used (at least by myself anyway) to pull information out of files, and reformat it.

Please note: AWK is short for awkward, not, as the rumour mill no doubt proclaims as an acronym for Aho, A.V., Weinberger, P.J., and Kernighan, B.W.

This Help file is in the form of examples (RTFM if you want syntactic notes)

Difficulty level of each one printed in [] brackets (0-10). If you are looking at awk for the first time concentrate on the easy ones first - these examples are in no particular order ( except the order that they were written in).

Dead simple example....[0]

ls -l | awk ' {print $1,$9} '
print only the 1st and 9th fields chucked out from ls -l (i.e. permissions and filename). Obviously, $1 is the first field, $2 is the second, $3 is the third... and $0 is the entire line. By default, fields are seperated by white space (tabs, spaces) and lines are seperated by carriage returns.

First Word matching [3]

awk -v mat=$N1 ' BEGIN { found = 0 }
	{ if ($1 == mat) {
		found = 1
		printf("%s\n",$0)
		}
	  else { if (found == 1) {
		printf("%s\n",$0)
		}
	}
	  if (length($0) == 0) {
		if(found == 1) {
			printf("--\n")
			}
		found = 0
	}
} ' /u/compjmd/help/handycommands

searches for $N1 in handycommands. If the first word on the line is $N1 then print out the remaining paragraph. Please note - as far as I am aware, there is no 'case' statement or 'switch' statement within awk (which is a pain sometimes).

Field and Record seperators [5]

awk "BEGIN { FS = \"\\t\" ; RS = \"\"}  
   /$N1/ " $EP

searches for $N1 in handycommands and prints out any paragraph which contains $N1. FS= Field Separator (Default is White Space), RS = Record Seperator (Default is Return). Changing FS and RS can make awk perform rather oddly in my experience.

Reformat of page [5]

Ok, awk script to reformat an invoice from Plain paper to invoice (dot matrix) printer: (This was hard...)

awk ' BEGIN { currline = 1 }
	{ doneit = 1
	if (currline == 1) {
		printf("\n\n\n")
		doneit = 2
		}
	if (currline == 9) {
		printf("%s%s\n",substr($0,1,50),substr($0,60,length($0)))
		doneit = 2
		}
	if (currline == 11) {
		printf("\n\n")
		doneit = 2
		}
	if (currline == 12) {
		printf("%s%s%s%s\n",substr($0,1,37),substr($0,45,30),substr($0,76,10),substr($0,100,20))
		doneit = 2
		}
	if (currline == 13) {
		printf("%s%s",substr($0,1,37),substr($0,45,30))
		printf("%s%s\n",substr($0,90,15),substr($0,113,25))
		doneit = 2
		}
	if (currline == 14) {
		printf("%s%s%s%s\n",substr($0,1,37),substr($0,45,30),substr($0,76,10),substr($0,100,length($0)))
		doneit = 2
		}
	if (currline == 15) {
		printf("%s%s",substr($0,1,37),substr($0,45,30))
		printf("%s%s\n",substr($0,90,15),substr($0,113,25))
		doneit = 2
		}
	if (currline == 16) {
		printf("%s%s%s%s\n",substr($0,1,37),substr($0,45,30),substr($0,82,10),substr($0,100,length($0)))
		doneit = 2
		}
	if (currline > 17) {
		if(currline < 50) {
			printf("%s%s%s\n",substr($0,1,77),substr($0,100,10),substr($0,120,10))
			doneit = 2
			}
		}
	if (currline >= 50) {
		if (currline < 55) {
			doneit = 2
			}
		}
	if (currline == 56) {
		printf("%s%s%s%s\n",substr($0,1,22),substr($0,24,18),substr($0,44,43),substr($0,120,10))
		doneit = 2
		}
	if (currline == 57) {
		printf("%s%s%s%s\n",substr($0,1,22),substr($0,24,18),substr($0,44,43),substr($0,120,10))
		doneit = 2
		}
	if (currline == 58) {
		printf("%s%s%s%s\n",substr($0,1,22),substr($0,24,18),substr($0,44,43),substr($0,120,10))
		doneit = 2
		}
	if (currline == 59) {
		printf("%s%s%s%s\n",substr($0,1,22),substr($0,24,18),substr($0,44,43),substr($0,120,10))
		doneit = 2
		}
	if (currline == 60) {
		printf("%s%s%s%s\n",substr($0,1,22),substr($0,24,18),substr($0,44,43),substr($0,120,10))
		doneit = 2
		}
	if (doneit == 1) {
		printf("%s\n",$0)
		}
	currline = currline + 1
	} ' $1 >  $1.X

nasty eh? the printf statement is very useful - if you've ever done C it has virtually identical syntax. %s = string, \n = new line, \t = tab, etc. substr is quite handy - syntax is substr(string-name,start-location,no of chars). e.g. the statement echo "Hello World" | awk '{ printf("%s %s\n",substr($0,1,4),$2 )}' # would produce an output of "Hell World"
Keeping a current line counter can be extremely useful in certain awk scripts.

Extra input files and arrays [8]

awk ' BEGIN { i = 1
	while(1) {
		getline word < "x"          # Get next line from file x.
		if(oldword == word) {       # If word = the word we got last
			break		    # time we have probably reached the
			}		    # end of the file.
		arr[i] = word		    # Put the word in the array
		i++
		oldword = word
		}
	}
	{ 
	sentance=$0			# Read $0 into variable sentance
	for(j = 1; j < i ; j++) {
		x = match(toupper(sentance),arr[j])      # If it matches
		if( x != 0) {
			# Change that word in the sentence (yes this does work)
			sentance = substr(sentance,0,(x-1)) toupper(substr(sentance,x,RLENGTH)) substr(sentance,(x+RLENGTH),length(sentance))    
			}
		}
	printf("%s\n",sentance)    # Print it out with/without changes
	}  ' $1

this awk script gets all the words/sentences from the file "x" and reads this into an array. Then it will open the specified file ($1) and capitalise any words which it finds that match in the array. For capitalising key words in programs for example. This awk script uses multiple files.

Changing page length [3]

awk -v pl=$1 -v npl=$2 ' BEGIN { currline = 0 }
		{ printf("%s\n",$0)
		  currline = currline + 1
		  if ( currline == pl ) {
			while(currline != npl) {
				currline = currline + 1
				printf("\n") 
				}
			currline = 0
			}
		} ' $3

Imagine having having to convert various files from having 48 lines per page to 66 lines per page (for printing purposes) . This will do the job. $1 = INITIAL page length, $2 = DESIRED page length, $3 is the filename. This only works for increasing (not decreasing) page length

Unload Format to Comma seperated file (CSV) converter [4]

# convert unload file to comma separated ( awk.7 )

awk ' 
   { 
	modstring=$0
	if ( substr(modstring,( length(modstring) ), 1 ) == "\|" ) {
						# If line ends in pipe symbol..
		modstring = substr(modstring,0 , ( length(modstring) - 1 ) )
						# remove it
		}
	gsub('/\|/',"\"\,\"",modstring)		# substitue all occurrences of
						# the pipe symbol with quote
						# comma quote ","
	printf("\"%s\"\n",modstring)		# Print out the resulting
						# string, with a quote on the
						# front and end.
	} ' $1

This example contains a use of the command 'gsub'. If you have ever used vi or sed substitutions, this is (nearly) the same. The syntax for gsub or sub is 'sub(Regular Expression , What to replace the regular expression with, String to perform this operation on.)' More often, it is simpler to write piped sed statements for this type of file conversion. The statement below produces the same result as the above:
$ cat $1 | sed 's/^/\"/' | sed 's/|$/\"/' | sed 's/|/\"\,\"/g' 1st sed Command: places a double-quote at the beginning of the line
2nd sed Command: substitutes a double-quote for a pipe symbol at the end of line
3rd sed Command: substitutes all remaining occurences of the pipe symbol with double- quote comma double-quote (",").

Simple addition [2]

# add up numbers in a file / stdin (awk.9)

awk ' BEGIN { x=0 } 
	{ x = x + $0 } 
	END { printf("Total: %s\n",x) } '

script to total up numbers. Might be used to total a list of files, e.g.
$ ls -l | awk ' BEGIN { x=0 } { x = x + $5 } END { printf("Total: %s\n",x) } '
Total: 213936

More on sub [5]

# To convert makefiles from AIX 3.2.5 to AIX 4.1.0
awk ' { if ( $3 == "-qdebug=useabs" ) {
			if( index($0,"$@") ) {
				modstring=$0
				sub("-bI:\/usr\/lib\/FCM\/lowsys.exp","",modstring)
				sub("-bI:\/usr\/lib\/FCM","",modstring)
				sub("-L \/usr\/lib\/FCM","",modstring)
				sub('/\\$\\{STUBOBJ\\}/',"",modstring)
				sub('/\\$\\{ITOOLS\\}/',"${ILIBS}",modstring)
				sub("-lfesql","-lfesql -lcurses",modstring)
				printf("%s\n",modstring)
				}
			else {
				printf("\t\t${CC} -qarch=com -I${ITOOLS} -I${IESQLC} -L${ILIBDIR} -l4gl -lnforms -lfesql -lbsd -c $*.c\n")
				}
			}
		else {
			printf("%s\n",$0)
			}
		} ' makefile

The purpose of this script is to convert lines that look like this:

${CC} -qarch=com -qdebug=useabs -bI:/usr/lib/pse.exp -bI:/usr/lib/FCM/lowsys.exp -L /usr/lib/FCM -I${ITOOLS} -I{IESQLC} ${STUBOBJ} -L${ILIBDIR} -l4gl -lnforms -lfesql -lbsd ${PROGRAM} ${LIBS} -o $@

to...

${CC} -qarch=com -qdebug=useabs -bI:/usr/lib/pse.exp -I${ILIBS} -I{IESQLC} -L${ILIBDIR} -l4gl -lnforms -lfesql -lcurses -lbsd {PROGRAM} {LIBS} -o $@

It uses the sub function to achieve this. However, as you can probably see from the above, the regular expression (first parameter) in sub is horrendously complicated. This is because the shell is trying to do substitution of variables as well as awk. So the escape sequence for a dollar symbol becomes \\$ ( shell will translate this to \$ and awk will translate it to $). One of the best ways around this problem is using an awk script file, which I shall discuss at some other point. Yeeuck.

Simple field and pattern matching [4]

awk -v codefile=$ACTUALFILE -v user=${ONLYUSER} '
		{ if( user == "ANY" ) {
			printf("%s got out by %s on %s v.%s\n",codefile,$3,$4,$2) 
			}
		  else {
			if( $3 == user ) {
				printf("%s got out by %s on %s v.%s\n",codefile,$3,$4,$2) 
				}
			}
		} ' ${SOURCECODEPFILE}

Another fairly simple example. Used to examine source code 'p.' files and extract various information

Processing and generating a HTML file[4]

# Read current date into a variable

TODAY=`date +%d/%m/%Y`

awk ' { i = match($0,"<A NAME="); # Find occurrences of bookmarks
	if(i > 0){
		j = match(substr($0,i),">"); # find end of bookmark link
		bookmarkstr = substr($0,i + 9,j - 9 - 2); # extract bookmark str
		if(toupper(bookmarkstr) != bookmarkstr){ # if not all upper case
			printf("%s\n",bookmarkstr);
			}
		}
	} ' $1 | sort -u | 
awk -vtoday=$TODAY ' BEGIN { 
	printf("<HTML>\n<HEAD><TITLE>AIX Command Guide Quick Look-Up</TITLE></HEAD>\n<BODY>\n");
	printf("<TABLE WIDTH=\"100%%\" cellpadding=0><TR>\n");
	printf("<TD WIDTH=\"25%%\" align=\"left\"><A HREF=\"../aixhelp.htm\">Up To AIX Help</A></TD>\n");
	printf("<TD WIDTH=\"25%%\" align=\"center\"><A HREF=\"../Home.htm\">Up To Home Page</A></TD>\n");
	printf("<TD WIDTH=\"25%%\" align=\"center\">Email: <A HREF=\"mailto:bigcalm@hotmail.com\">bigcalm@hotmail.com</A></TD>\n");
	printf("<TD WIDTH=\"25%%\" align=\"right\">This page was updated: %s</TD>\n",today);
	printf("</TR></TABLE>");
	printf("<HR>\n");
	printf("<CENTER><H2>AIX Command Guide Quick Look-Up</H2></CENTER>\n");
	printf("<TABLE WIDTH=\"100%%\" cellpadding=0>\n");
	}
	{
	if( (NR - 1) % 4 == 0){
		if((NR - 1) > 0){
			printf("</TR>");
			}
		printf("<TR>\n"); }
	printf("<TD><A HREF=\"handycommands.htm#%s\">%s</A></TD>\n",$0,$0);
	}
	END {
	tmpNR = NR;
	while(tmpNR % 4 != 0){
		printf("<TD></TD>");
		tmpNR++;
		}
	printf("</TR>");
	printf("</TABLE>");
	printf("<HR>\n");
	printf("<TABLE WIDTH=\"100%%\" cellpadding=0><TR>\n");
	printf("<TD WIDTH=\"25%%\" align=\"left\"><A HREF=\"../aixhelp.htm\">Up To AIX Help</A></TD>\n");
	printf("<TD WIDTH=\"25%%\" align=\"center\"><A HREF=\"../Home.htm\">Up To Home Page</A></TD>\n");
	printf("<TD WIDTH=\"25%%\" align=\"center\">Email: <A HREF=\"mailto:bigcalm@hotmail.com\">bigcalm@hotmail.com</A></TD>\n");
	printf("<TD WIDTH=\"25%%\" align=\"right\">This page was updated: %s</TD>\n",today);
	printf("</TR></TABLE>");
	printf("</BODY>\n</HTML>\n");
	} '

This awk script was used to generate the alphabetic listing of the handycommands help file

----------- End of awkhelp file -----------

This Page Was Last updated 28/09/01