The AWK command is a powerful and versatile tool for processing and transforming text data in Linux. Whether extracting information, filtering rows, reformatting output, or performing calculations, AWK can make your life easier with just a few lines of code. In this article, you will learn how to use the AWK command for text manipulation in Linux.
What is the AWK command?
The AWK command is an interpreted programming language that runs in the Linux terminal. Its name comes from the initials of its creators: Alfred Aho, Peter Weinberger and Brian Kernighan. AWK was originally designed to process files structured into fields separated by delimiters, such as CSV files or /etc/passwd files. But AWK can also handle more complex text files, such as HTML or XML files. AWK is not an object-oriented programming language , but it allows you to define local or global functions and variables. It also has control structures like loops and conditions.
The general syntax of the AWK command is as follows:
awk [options] 'program' [files]
The program
is a series of instructions that define patterns to search for in each line of the file
and actions to perform when a pattern is found. The options
allow you to modify the behavior of the AWK command, such as the choice of field delimiter or the output format.
How to print text with the AWK command?
The AWK command can be used to print a message to the terminal based on a pattern in the text. If you run the AWK command without any reason and just a print command, AWK prints the message every time you press Enter.
For example, if you type:
awk '{print "Hello"}'
And you press Enter several times, you get:
Good morning, good morning, good morning
To stop the AWK command, you can press Ctrl+C.
If you want to print the contents of a file with the AWK command, you can use the BEGIN
, which runs before reading the file, and the END
, which runs after reading the file. For example, if you have a file named test.txt
that contains:
This is a test AWK is a great tool Linux is the best operating system
You can print the contents of the file with the following command:
awk 'BEGIN {print "Here is the content of the test.txt file:"} {print} END {print "End of file"}' test.txt
Which give :
Here is the content of the test.txt file: This is a test AWK is a great tool Linux is the best operating system End of file
{print}
command with no arguments prints the entire line. You can also print a specific field using the $n
, where n
is the field number. By default, fields are separated by spaces or tabs, but you can change the delimiter with the -F
.
For example, if you want to print the first and third fields of the /etc/passwd
, which are separated by a colon ( :)
, you can use the following command:
awk -F: '{print $1 " " $3}' /etc/passwd
Which gives something like:
root 0 daemon 1 bin 2 sys 3 sync 4 games 5 man 6 lp 7 mail 8 news 9 uucp 10 proxy 13 www-data 33 ...
You can also print arithmetic expressions or character strings with the AWK command. For example, if you want to print the square of the second field of the test.txt
, you can use the following command:
awk '{print $2^2}' test.txt
Which give :
is 16 is
If you want to print the number of lines in the test.txt
, you can use the special variable NR
, which contains the number of the current line. For example, you can use the following command:
awk 'END {print NR}' test.txt
Which give :
3
How to filter text with the AWK command?
The AWK command can be used to filter text based on patterns or conditions. If you specify a pattern before an action, AWK only performs the action if the pattern is found in the row. The pattern can be a regular expression, a comparison, a logical operation, or a combination of these.
For example, if you want to print lines from the test.txt
that contain the word Linux
, you can use the following command:
awk '/Linux/ {print}' test.txt
Which give :
Linux is the best operating system
If you want to print lines from the /etc/passwd
that have a UID greater than 1000, you can use the following command:
awk -F: '$3 > 1000 {print}' /etc/passwd
Which gives something like:
systemd-coredump:x:997:997:systemd Core Dumper:/:/usr/sbin/nologin tss:x:131:142:TPM software stack,,,:/var/lib/tpm:/bin/false _rpc: x:132:65534::/run/rpcbind:/usr/sbin/nologin statd:x:133:65534::/var/lib/nfs:/usr/sbin/nologin libvirt-qemu:x:64055:139: Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin libvirt-dnsmasq:x:134:144:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/usr/sbin/nologin snapd-range-524288-root:x:524288:524288::/nonexistent:/bin/false snap_daemon:x:584788:584788::/nonexistent:/bin/false ...
You can also use the logical operators &&
(and), ||
(or) and !
(no) to combine patterns. For example, if you want to print lines from the /etc/passwd
that have a UID greater than 1000 and a shell other than /usr/sbin/nologin
, you can use the following command:
awk -F: '$3 > 1000 && $7 != "/usr/sbin/nologin" {print}' /etc/passwd
To copy
Which gives something like:
tss:x:131:142:TPM software stack,,,:/var/lib/tpm:/bin/false _rpc:x:132:65534::/run/rpcbind:/usr/sbin/nologin statd:x: 133:65534::/var/lib/nfs:/usr/sbin/nologin libvirt-qemu:x:64055:139:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin libvirt-dnsmasq :x:134:144:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/usr/sbin/nologin snapd-range-524288-root:x:524288:524288::/nonexistent:/bin/false snap_daemon:x:584788:584788::/nonexistent:/bin/false ...
How to edit text with the AWK command?
The AWK command can be used to modify text using built-in functions or special variables. For example, if you want to replace spaces with hyphens in the test.txt
, you can use the gsub
, which replaces all occurrences of one string with another. You can also use the special OFS
, which defines the output field separator. For example, you can use the following command:
awk '{print strftime("%d/%m/%Y %H:%M:%S",$1 " " $2)}' test.txt
Which give :
30/10/2021 16:13:49
31/10/2021 17:14:50
01/11/2021 18:15:51
You can consult the AWK command manual for other available functions and variables.
How to use for loop with AWK command?
The AWK command can be used to perform for loops over fields or lines in a file. The syntax of the for loop is as follows:
for (variable in array) action
Where variable
is the name of the variable that successively takes the values from the array
, and action
is the action to perform in each iteration.
For example, if you want to print the fields of a file in reverse order, you can use the for loop with the special variable NF
, which contains the number of fields in the current line. For example, if you have a test.txt
that contains:
This is a test AWK is a great tool Linux is the best operating system
You can reverse the order of fields with the following command:
awk '{for (i=NF; i>0; i--) print $i}' test.txt
Which give :
test one is This great tool one is AWK operating system the best is Linux
You can also use the for loop to iterate through the lines of a file with the special variable FNR
, which contains the line number relating to the current file. For example, if you want to print the even line numbers of the test.txt
, you can use the following command:
awk 'FNR%2==0 {print FNR}' test.txt
Which give :
2
4
How to run an AWK script?
.awk
extension and give it execution rights with the chmod +x
. Then you can run the script with the command ./script_name.awk [files]
.
For example, if you have a script named hello.awk
that contains:
#!/usr/bin/awk -f BEGIN {print "Hello"}
You can run the script with the following command:
./hello.awk
Which give :
Good morning
How to pass arguments to an AWK script?
To pass arguments to an AWK script, you can use two methods:
- The first method is to use the
-v
with thevariable=value
. For example, if you want to pass two arguments namedvar1
andvar2
to yourhello.awk
, you can use the following command:
awk -v var1=hello -v var2=world -f hello.awk
And in your hello.awk
you can access the arguments with variables $var1
and $var2
. For example, if your script contains:
#!/usr/bin/awk -f BEGIN {print $var1 " " $var2}
You obtain :
Bonjour Monde
- The second method is to use the special
ARGV
, which contains the arguments passed to the script. For example, if you want to pass two unnamed arguments to yourhello.awk
, you can use the following command:
awk -f hello.awk hello world
And in your hello.awk
ARGV[1]
and ARGV[2]
indices . For example, if your script contains:
#!/usr/bin/awk -f BEGIN {print ARGV[1] " " ARGV[2]}
You obtain :
Bonjour Monde
FAQs
What is the difference between AWK and GAWK?
GAWK is a GNU implementation of AWK, which adds additional features to the original language, such as support for extended regular expressions, multidimensional arrays or predefined functions.
How to debug an AWK script?
To debug an AWK script, you can use the -W
lint
parameter , which displays warning messages about potential errors in the script. You can also use the -W
with the dump-variables
, which displays the variable values at the end of the script execution.
How to use the AWK command to sort data?
To use the AWK command to sort data, you can use the sort
in combination with AWK. For example, if you want to sort users in the /etc/passwd
by their UIDs, you can use the following command:
awk -F: '{print $1, $3}' /etc/passwd | sort -n -k2
By combining AWK with other commands (here sort
), you can easily go much further in displaying and organizing data.
How to print the word count of a file with the AWK command?
To print the number of words in a file with the AWK command, you can use the special variable NF
, which contains the number of fields in the current line, and the special variable NR
, which contains the number of the current line. Using a for loop, you can count the number of words in each line and add them to a total
. Using the special pattern END
you can print the final result. For example, if you have a file named test.txt
that contains:
This is a test AWK is a great tool Linux is the best operating system
You can print the word count of the file with the following command:
awk '{for (i=1; i<=NF; i++) total++} END {print total}' test.txt
Which give :
9
How to use the AWK command to extract data from a CSV file?
To use the AWK command to extract data from a comma-separated values (CSV) file, you can use the -F
to set the field separator to a comma. For example, if you have a file named test.csv
that contains:
name, first name, age Alice, Dupont, 25 Bob, Martin, 32 Charles, Durand, 28
You can extract the name and age of people with the following command:
awk -F"," '{print $1 " " $3}' test.csv
Which give :
name age Alice 25 Bob 32 Charles 28
How to filter data with the AWK command?
The AWK command allows you to filter data based on patterns, which are regular expressions or logical conditions. Patterns are placed before actions, separated by curly brackets. For example, if you want to display lines in the test.csv
that contain the name Alice
, you can use the following pattern:
awk -F"," '/Alice/ {print}' test.csv
Which give :
Alice,Dupont,25
If you want to display lines in the test.csv
that are older than 30 years, you can use the following pattern:
awk -F"," '$3 > 30 {print}' test.csv
Which give :
Bob,Martin,32
You can combine multiple patterns with the logical operators &&
(and), ||
(or) and !
(No). For example, if you want to display lines in the test.csv
that have a name starting with C
or an age less than 10 years old, you can use the following pattern:
awk -F"," '($1 ~ /^C/) || ($3 < 10) {print}' test.csv
Which give :
CAMILLE,M,7 CLARA,F,11 CLEMENT,M,7
How to calculate statistics with the AWK command?
The AWK command allows you to calculate statistics on numerical data in a file, such as sum, average, minimum or maximum. To do this, simply use variables to store intermediate values and update them on each line. Using the special END
, we can display the final result. For example, if you want to calculate the sum and average of the ages in the test.csv
, you can use the following program:
awk -F"," 'NR>1 {sum+=$3; count++} END {print "Sum: " sum; print 'Average: 'sum/count}' test.csv
Which give :
Sum: 110 Average: 18.3333
Explanations:
- We use the
-F","
to define the field separator as a comma. - We use the condition
NR>1
to ignore the first line of the file, which contains the column names. - We use the
sum
andcount
to accumulate the sum and number of ages. We use the+=
to increment the variables with the value of the third field ($3
). - We use the
END
to display the final result. We use the/
to calculate the average by dividing the sum by the number.
Likewise, if you want to calculate the minimum and maximum ages of the test.csv
, you can use the following program:
awk -F"," 'NR>1 {if (min=="") min=max=$3; if ($3 max) max=$3} END {print "Min: " min; print 'Max: 'max}' test.csv
Which give :
Min: 6 Max: 32
Explanations:
- We use the
-F","
to define the field separator as a comma. - We use the condition
NR>1
to ignore the first line of the file, which contains the column names. - We use the
min
andmax
to store the minimum and maximum ages. We initialize these variables with the value of the third field ($3
) if they are empty (""
). We use the<
and>
to compare values and update variables if necessary. - We use the
END
to display the final result.
Conclusion
The AWK command is an essential tool for manipulating texts under Linux. It allows you to perform complex tasks in a few lines of code, such as extracting, filtering, modifying or calculating data. It offers great flexibility thanks to its patterns, actions, functions and variables. It can be combined with other Linux commands to expand its possibilities. For example, you can use the Echo command on Windows to display a message or variable on the screen. If you want to learn more about the AWK command, you can consult the manual or the many tutorials available on the Internet.