Awk Quick Reference

Awk Quick Reference - by Bruce Barnett @grymoire

AWK can be thought of as a program that can read rows and columns of information, and generate data - like a spreadsheet. It can also be thought of as a simple C interpretor, as AWK and C have similar features.

MAWK Usage

From mawk(1) mawk [-W option] [-F value] [-v var=value] [--] 'program text' [file ...] mawk [-W option] [-F value] [-v var=value] [-f program-file] [--] [file ...]

GAWK Usage

From gawk --help:

Usage: gawk [POSIX or GNU style options] -f progfile [--] file ...
Usage: gawk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:		GNU long options:
	-f progfile		--file=progfile
	-F fs			--field-separator=fs
	-v var=val		--assign=var=val
	-m[fr] val
	-O			--optimize
	-W compat		--compat
	-W copyleft		--copyleft
	-W copyright		--copyright
	-W dump-variables[=file]	--dump-variables[=file]
	-W exec=file		--exec=file
	-W gen-po		--gen-po
	-W help			--help
	-W lint[=fatal]		--lint[=fatal]
	-W lint-old		--lint-old
	-W non-decimal-data	--non-decimal-data
	-W profile[=file]	--profile[=file]
	-W posix		--posix
	-W re-interval		--re-interval
	-W source=program-text	--source=program-text
	-W traditional		--traditional
	-W usage		--usage
	-W use-lc-numeric	--use-lc-numeric
	-W version		--version

Program

There are only a few commands in AWK. The Tables below are from my awk tutorial. Check this out if you need a beter explanation. The basic operation of AWK is that a line from the input file is read, and for each line, the AWK script is executed.

Basic Structure

The basic structure of an AWK script consists of one or more of the following types of lines:
pattern { statements }
function name(parameter_list) { statements }

Patterns

If a pattern is not specified, it defaults to be "true", and every line read will cause the starement to be executed,

A pattern can have the following form.

    BEGIN
    END
    /regular expression/
    relational expression
    pattern && pattern
    pattern || pattern
    pattern ? pattern : pattern
    (pattern)
    ! pattern
    pattern1, pattern2 - Range pattern

Statements

Statements have the following syntax, separated by a new line or a semicolon.

if ( conditional ) statement [ else statement ]
while ( conditional ) statement
for ( expression ; conditional ; expression ) statement
for ( variable in array ) statement
break
continue
{ [ statement ] ...}
variable=expression
print [ expression-list ] [ > expression ]
printf format [ , expression-list ] [ > expression ]
next
exit

Special Variables

AWK Table 14
Special Variables
VariablePurposeAWKNAWKGAWK
FSField separatorYesYesYes
NFNumber of FieldsYesYesYes
RSRecord separatorYesYesYes
NRNumber of input recordsYesYesYes
FILENAMECurrent filenameYesYesYes
OFSOutput field separatorYesYesYes
ORSOutput record separatorYesYesYes
ARGC# of argumentsYesYes
ARGVArray of argumentsYesYes
ARGINDIndex of ARGV of current fileYes
FNRInput record numberYesYes
OFMTOuput format (default "%.6g")YesYes
RSTARTIndex of first character after match()YesYes
RLENGTHLength of string after match()YesYes
SUBSEPDefault separator with multiple subscripts in array (default "\034")YesYes
ENVIRONArray of environment variablesYes
IGNORECASEIgnore case of regular expressionYes
CONVFMTconversion format (default: "%.6g")Yes
ERRNOCurrent error after getline failureYes
FIELDWIDTHSlist of field widths (instead of using FS)Yes
BINMODEBinary Mode (Windows)Yes
LINTTurns --lint mode on/offYes
PROCINFOArray of informaiton about current AWK programYes
RTRecord terminatorYes
TEXTDOMAINText domain (i.e. localization) of current AWK programYes

Variables $1, $2, etc.

The variables $1, $2, etc created by spliting up each line into fields. $1 is the first field (i.e. the first column), $2 is the second, etc.

Relational expressions are created using unary, binary, relational, the following operators:

Unary variables change the value of a variable.

Unary Operators
variable operator
operator variable
OperatorMeaning
++Increment by 1
--Decrement by 1

Binary operators combine values.

AWK Table 1
Binary Operators
expression operator expression
OperatorTypeMeaning
+ArithmeticAddition
-ArithmeticSubtraction
*ArithmeticMultiplication
/ArithmeticDivision
%ArithmeticModulo
<space>StringConcatenation

Assignment variables change the values of variables.

AWK Table 2
Assignment Operators
variable operator expression
OperatorMeaning
+=Add result to variable
-=Subtract result from variable
*=Multiply variable by result
/=Divide variable by result
%=Apply modulo to variable

Relational operators compare values.

AWK Table 3
Relational Operators
expression operator expression
OperatorMeaning
==Is equal
!=Is not equal to
>Is greater than
>=Is greater than or equal to
<Is less than
<=Is less than or equal to

Certain characters that follow a '\' have a special meaning.

AWK Table 5
Escape Sequences
SequenceDescription
\aASCII bell (NAWK/GAWK only)
\bBackspace
\fFormfeed
\nNewline
\rCarriage Return
\tHorizontal tab
\vVertical tab (NAWK only)
\dddCharacter (1 to 3 octal digits) (NAWK only)
\xddCharacter (hexadecimal) (NAWK only)
\<Any other character>That character

The printf or sprintf statement generates a string using a format field and variables.

printf(Format,variable, variable,...) statement, 

Inside the format field, you can define how the variables should be output.

AWK Table 6
Format Specifiers
SpecifierMeaning
%cASCII Character
%dDecimal integer
%eFloating Point number
(engineering format)
%fFloating Point number
(fixed point format)
%gThe shorter of e or f,
with trailing zeros removed
%oOctal
%sString
%xHexadecimal
%%Literal %

Here are some examples of format conversions.

AWK Table 7
Example of format conversions
FormatValueResults
%c100.0d
%c"100.0"1 (NAWK?)
%c42"
%d100.0100
%e100.01.000000e+02
%f100.0100.000000
%g100.0100
%o100.0144
%s100.0100.0
%s"13f"13f
%d"13f"0 (AWK)
%d"13f"13 (NAWK)
%x100.064

Here are more complex format conversion examples

AWK Table 8
Examples of complex formatting
FormatVariableResults
%c100"d"
%10c100" d"
%010c100"000000000d"
%d10"10"
%10d10" 10"
%10.4d10.123456789" 0010"
%10.8d10.123456789" 00000010"
%.8d10.123456789"00000010"
%010d10.123456789"0000000010"
%e987.1234567890"9.871235e+02"
%10.4e987.1234567890"9.8712e+02"
%10.8e987.1234567890"9.87123457e+02"
%f987.1234567890"987.123457"
%10.4f987.1234567890" 987.1235"
%010.4f987.1234567890"00987.1235"
%10.8f987.1234567890"987.12345679"
%g987.1234567890"987.123"
%10g987.1234567890" 987.123"
%10.4g987.1234567890" 987.1"
%010.4g987.1234567890"00000987.1"
%.8g987.1234567890"987.12346"
%o987.1234567890"1733"
%10o987.1234567890" 1733"
%010o987.1234567890"0000001733"
%.8o987.1234567890"00001733"
%s987.123"987.123"
%10s987.123" 987.123"
%10.4s987.123" 987."
%010.8s987.123"000987.123"
%x987.1234567890"3db"
%10x987.1234567890" 3db"
%010x987.1234567890"00000003db"
%.8x987.1234567890"000003db"

The AWK variants have build-in functions. There are numeric, string, and miscellaneous functions.

AWK Table 9
Numeric Functions
NameFunctionVariant
coscosineGAWK,AWK,NAWK
expExponentGAWK,AWK,NAWK
intIntegerGAWK,AWK,NAWK
logLogarithmGAWK,AWK,NAWK
sinSineGAWK,AWK,NAWK
sqrtSquare RootGAWK,AWK,NAWK
atan2ArctangentGAWK,NAWK
randRandomGAWK,NAWK
srandSeed RandomGAWK,NAWK

AWK Table 10
String Functions
NameVariant
index(string,search)AWK, NAWK, GAWK
length(string)AWK, NAWK, GAWK
split(string,array,separator)AWK, NAWK, GAWK
substr(string,position)AWK, NAWK, GAWK
substr(string,position,max)AWK, NAWK, GAWK
sub(regex,replacement)NAWK, GAWK
sub(regex,replacement,string)NAWK, GAWK
gsub(regex,replacement)NAWK, GAWK
gsub(regex,replacement,string)NAWK, GAWK
match(string,regex)NAWK, GAWK
tolower(string)GAWK
toupper(string)GAWK
asort(string,[d])GAWK
asorti(string,[d])GAWK
gensub(r,s,h [,t])GAWK
strtonum(string)GAWK
AWK Table 11
Miscellaneous Functions
NameVariant
getlineAWK, NAWK, GAWK
getline <fileNAWK, GAWK
getline variableNAWK, GAWK
getline variable <fileNAWK, GAWK
"command" | getlineNAWK, GAWK
"command" | getline variableNAWK, GAWK
system(command)NAWK, GAWK
close(command)NAWK, GAWK
systime()GAWK
strftime(string)GAWK
strftime(string, timestamp)GAWK

The strftimefunction has special formats.

AWK Table 12
GAWK's strftime formats
%aThe locale's abbreviated weekday name
%AThe locale's full weekday name
%bThe locale's abbreviated month name
%BThe locale's full month name
%cThe locale's "appropriate" date and time representation
%dThe day of the month as a decimal number (01--31)
%HThe hour (24-hour clock) as a decimal number (00--23)
%IThe hour (12-hour clock) as a decimal number (01--12)
%jThe day of the year as a decimal number (001--366)
%mThe month as a decimal number (01--12)
%MThe minute as a decimal number (00--59)
%pThe locale's equivalent of the AM/PM
%SThe second as a decimal number (00--61).
%UThe week number of the year (Sunday is first day of week)
%wThe weekday as a decimal number (0--6). Sunday is day 0
%WThe week number of the year (Monday is first day of week)
%xThe locale's "appropriate" date representation
%XThe locale's "appropriate" time representation
%yThe year without century as a decimal number (00--99)
%YThe year with century as a decimal number
%ZThe time zone name or abbreviation
%%A literal %.

Modern versions of GAWK (Gnu AWK) have additional functions.

AWK Table 13
Optional GAWK strftime formats
%DEquivalent to specifying %m/%d/%y
%eThe day of the month, padded with a blank if it is only one digit
%hEquivalent to %b, above
%nA newline character (ASCII LF)
%rEquivalent to specifying %I:%M:%S %p
%REquivalent to specifying %H:%M
%TEquivalent to specifying %H:%M:%S
%tA TAB character
%kThe hour as a decimal number (0-23)
%lThe hour (12-hour clock) as a decimal number (1-12)
%CThe century, as a number between 00 and 99
%uis replaced by the weekday as a decimal number [Monday == 1]
%Vis replaced by the week number of the year (using ISO 8601)
%vThe date in VMS format (e.g. 20-JUN-1991)
Valid HTML 4.01!