Last modified: Fri Nov 27 09:44:57 2020
Awk Quick Reference - by Bruce Barnett @grymoire
AWK can be thought of as a program that can read rows and columns of information, and generate data - like a spreadsheet. It can also be thought of as a simple C interpretor, as AWK and C have similar features.
From mawk(1) mawk [-W option] [-F value] [-v var=value] [--] 'program text' [file ...] mawk [-W option] [-F value] [-v var=value] [-f program-file] [--] [file ...]
From gawk --help:
Usage: gawk [POSIX or GNU style options] -f progfile [--] file ... Usage: gawk [POSIX or GNU style options] [--] 'program' file ... POSIX options: GNU long options: -f progfile --file=progfile -F fs --field-separator=fs -v var=val --assign=var=val -m[fr] val -O --optimize -W compat --compat -W copyleft --copyleft -W copyright --copyright -W dump-variables[=file] --dump-variables[=file] -W exec=file --exec=file -W gen-po --gen-po -W help --help -W lint[=fatal] --lint[=fatal] -W lint-old --lint-old -W non-decimal-data --non-decimal-data -W profile[=file] --profile[=file] -W posix --posix -W re-interval --re-interval -W source=program-text --source=program-text -W traditional --traditional -W usage --usage -W use-lc-numeric --use-lc-numeric -W version --version
There are only a few commands in AWK. The Tables below are from my awk tutorial. Check this out if you need a beter explanation. The basic operation of AWK is that a line from the input file is read, and for each line, the AWK script is executed.
pattern { statements } function name(parameter_list) { statements }
A pattern can have the following form.
BEGIN END /regular expression/ relational expression pattern && pattern pattern || pattern pattern ? pattern : pattern (pattern) ! pattern pattern1, pattern2 - Range pattern
Statements have the following syntax, separated by a new line or a semicolon.
AWK Table 14 Special Variables | ||||
---|---|---|---|---|
Variable | Purpose | AWK | NAWK | GAWK |
FS | Field separator | Yes | Yes | Yes |
NF | Number of Fields | Yes | Yes | Yes |
RS | Record separator | Yes | Yes | Yes |
NR | Number of input records | Yes | Yes | Yes |
FILENAME | Current filename | Yes | Yes | Yes |
OFS | Output field separator | Yes | Yes | Yes |
ORS | Output record separator | Yes | Yes | Yes |
ARGC | # of arguments | Yes | Yes | |
ARGV | Array of arguments | Yes | Yes | |
ARGIND | Index of ARGV of current file | Yes | ||
FNR | Input record number | Yes | Yes | |
OFMT | Ouput format (default "%.6g") | Yes | Yes | |
RSTART | Index of first character after match() | Yes | Yes | |
RLENGTH | Length of string after match() | Yes | Yes | |
SUBSEP | Default separator with multiple subscripts in array (default "\034") | Yes | Yes | |
ENVIRON | Array of environment variables | Yes | ||
IGNORECASE | Ignore case of regular expression | Yes | ||
CONVFMT | conversion format (default: "%.6g") | Yes | ||
ERRNO | Current error after getline failure | Yes | ||
FIELDWIDTHS | list of field widths (instead of using FS) | Yes | ||
BINMODE | Binary Mode (Windows) | Yes | ||
LINT | Turns --lint mode on/off | Yes | ||
PROCINFO | Array of informaiton about current AWK program | Yes | ||
RT | Record terminator | Yes | ||
TEXTDOMAIN | Text domain (i.e. localization) of current AWK program | Yes |
Relational expressions are created using unary, binary, relational, the following operators:
Unary variables change the value of a variable.
Unary Operators variable operator operator variable | |
---|---|
Operator | Meaning |
++ | Increment by 1 |
-- | Decrement by 1 |
Binary operators combine values.
AWK Table 1 Binary Operators expression operator expression | ||
---|---|---|
Operator | Type | Meaning |
+ | Arithmetic | Addition |
- | Arithmetic | Subtraction |
* | Arithmetic | Multiplication |
/ | Arithmetic | Division |
% | Arithmetic | Modulo |
<space> | String | Concatenation |
Assignment variables change the values of variables.
AWK Table 2 Assignment Operators variable operator expression | |
---|---|
Operator | Meaning |
+= | Add result to variable |
-= | Subtract result from variable |
*= | Multiply variable by result |
/= | Divide variable by result |
%= | Apply modulo to variable |
Relational operators compare values.
AWK Table 3 Relational Operators expression operator expression | |
---|---|
Operator | Meaning |
== | Is equal |
!= | Is not equal to |
> | Is greater than |
>= | Is greater than or equal to |
< | Is less than |
<= | Is less than or equal to |
Certain characters that follow a '\' have a special meaning.
AWK Table 5 Escape Sequences | |
---|---|
Sequence | Description |
\a | ASCII bell (NAWK/GAWK only) |
\b | Backspace |
\f | Formfeed |
\n | Newline |
\r | Carriage Return |
\t | Horizontal tab |
\v | Vertical tab (NAWK only) |
\ddd | Character (1 to 3 octal digits) (NAWK only) |
\xdd | Character (hexadecimal) (NAWK only) |
\<Any other character> | That character |
The printf or sprintf statement generates a string using a format field and variables.
printf(Format,variable, variable,...) statement,
Inside the format field, you can define how the variables should be output.
AWK Table 6 Format Specifiers | |
---|---|
Specifier | Meaning |
%c | ASCII Character |
%d | Decimal integer |
%e | Floating Point number (engineering format) |
%f | Floating Point number (fixed point format) |
%g | The shorter of e or f, with trailing zeros removed |
%o | Octal |
%s | String |
%x | Hexadecimal |
%% | Literal % |
Here are some examples of format conversions.
AWK Table 7 Example of format conversions | ||
---|---|---|
Format | Value | Results |
%c | 100.0 | d |
%c | "100.0" | 1 (NAWK?) |
%c | 42 | " |
%d | 100.0 | 100 |
%e | 100.0 | 1.000000e+02 |
%f | 100.0 | 100.000000 |
%g | 100.0 | 100 |
%o | 100.0 | 144 |
%s | 100.0 | 100.0 |
%s | "13f" | 13f |
%d | "13f" | 0 (AWK) |
%d | "13f" | 13 (NAWK) |
%x | 100.0 | 64 |
Here are more complex format conversion examples
AWK Table 8 Examples of complex formatting | ||
---|---|---|
Format | Variable | Results |
%c | 100 | "d" |
%10c | 100 | " d" |
%010c | 100 | "000000000d" |
%d | 10 | "10" |
%10d | 10 | " 10" |
%10.4d | 10.123456789 | " 0010" |
%10.8d | 10.123456789 | " 00000010" |
%.8d | 10.123456789 | "00000010" |
%010d | 10.123456789 | "0000000010" |
%e | 987.1234567890 | "9.871235e+02" |
%10.4e | 987.1234567890 | "9.8712e+02" |
%10.8e | 987.1234567890 | "9.87123457e+02" |
%f | 987.1234567890 | "987.123457" |
%10.4f | 987.1234567890 | " 987.1235" |
%010.4f | 987.1234567890 | "00987.1235" |
%10.8f | 987.1234567890 | "987.12345679" |
%g | 987.1234567890 | "987.123" |
%10g | 987.1234567890 | " 987.123" |
%10.4g | 987.1234567890 | " 987.1" |
%010.4g | 987.1234567890 | "00000987.1" |
%.8g | 987.1234567890 | "987.12346" |
%o | 987.1234567890 | "1733" |
%10o | 987.1234567890 | " 1733" |
%010o | 987.1234567890 | "0000001733" |
%.8o | 987.1234567890 | "00001733" |
%s | 987.123 | "987.123" |
%10s | 987.123 | " 987.123" |
%10.4s | 987.123 | " 987." |
%010.8s | 987.123 | "000987.123" |
%x | 987.1234567890 | "3db" |
%10x | 987.1234567890 | " 3db" |
%010x | 987.1234567890 | "00000003db" |
%.8x | 987.1234567890 | "000003db" |
The AWK variants have build-in functions. There are numeric, string, and miscellaneous functions.
AWK Table 9 Numeric Functions | ||
---|---|---|
Name | Function | Variant |
cos | cosine | GAWK,AWK,NAWK |
exp | Exponent | GAWK,AWK,NAWK |
int | Integer | GAWK,AWK,NAWK |
log | Logarithm | GAWK,AWK,NAWK |
sin | Sine | GAWK,AWK,NAWK |
sqrt | Square Root | GAWK,AWK,NAWK |
atan2 | Arctangent | GAWK,NAWK |
rand | Random | GAWK,NAWK |
srand | Seed Random | GAWK,NAWK |
AWK Table 10 String Functions | |
---|---|
Name | Variant |
index(string,search) | AWK, NAWK, GAWK |
length(string) | AWK, NAWK, GAWK |
split(string,array,separator) | AWK, NAWK, GAWK |
substr(string,position) | AWK, NAWK, GAWK |
substr(string,position,max) | AWK, NAWK, GAWK |
sub(regex,replacement) | NAWK, GAWK |
sub(regex,replacement,string) | NAWK, GAWK |
gsub(regex,replacement) | NAWK, GAWK |
gsub(regex,replacement,string) | NAWK, GAWK |
match(string,regex) | NAWK, GAWK |
tolower(string) | GAWK |
toupper(string) | GAWK |
asort(string,[d]) | GAWK |
asorti(string,[d]) | GAWK |
gensub(r,s,h [,t]) | GAWK |
strtonum(string) | GAWK |
AWK Table 11 Miscellaneous Functions | |
---|---|
Name | Variant |
getline | AWK, NAWK, GAWK |
getline <file | NAWK, GAWK |
getline variable | NAWK, GAWK |
getline variable <file | NAWK, GAWK |
"command" | getline | NAWK, GAWK |
"command" | getline variable | NAWK, GAWK |
system(command) | NAWK, GAWK |
close(command) | NAWK, GAWK |
systime() | GAWK |
strftime(string) | GAWK |
strftime(string, timestamp) | GAWK |
The strftimefunction has special formats.
AWK Table 12 GAWK's strftime formats | |
---|---|
%a | The locale's abbreviated weekday name |
%A | The locale's full weekday name |
%b | The locale's abbreviated month name |
%B | The locale's full month name |
%c | The locale's "appropriate" date and time representation |
%d | The day of the month as a decimal number (01--31) |
%H | The hour (24-hour clock) as a decimal number (00--23) |
%I | The hour (12-hour clock) as a decimal number (01--12) |
%j | The day of the year as a decimal number (001--366) |
%m | The month as a decimal number (01--12) |
%M | The minute as a decimal number (00--59) |
%p | The locale's equivalent of the AM/PM |
%S | The second as a decimal number (00--61). |
%U | The week number of the year (Sunday is first day of week) |
%w | The weekday as a decimal number (0--6). Sunday is day 0 |
%W | The week number of the year (Monday is first day of week) |
%x | The locale's "appropriate" date representation |
%X | The locale's "appropriate" time representation |
%y | The year without century as a decimal number (00--99) |
%Y | The year with century as a decimal number |
%Z | The time zone name or abbreviation |
%% | A literal %. |
Modern versions of GAWK (Gnu AWK) have additional functions.
AWK Table 13 Optional GAWK strftime formats | |
---|---|
%D | Equivalent to specifying %m/%d/%y |
%e | The day of the month, padded with a blank if it is only one digit |
%h | Equivalent to %b, above |
%n | A newline character (ASCII LF) |
%r | Equivalent to specifying %I:%M:%S %p |
%R | Equivalent to specifying %H:%M |
%T | Equivalent to specifying %H:%M:%S |
%t | A TAB character |
%k | The hour as a decimal number (0-23) |
%l | The hour (12-hour clock) as a decimal number (1-12) |
%C | The century, as a number between 00 and 99 |
%u | is replaced by the weekday as a decimal number [Monday == 1] |
%V | is replaced by the week number of the year (using ISO 8601) |
%v | The date in VMS format (e.g. 20-JUN-1991) |