How to master the AWK command for processing texts under Linux?

The AWK command is a powerful and versatile tool for processing and transforming text data in Linux. Whether you need to extract information, filter lines, reformat output, or perform calculations, AWK can simplify your work with just a few lines of code. In this article, you will learn how to use the AWK command for text manipulation in Linux.

What is the AWK command?

The AWK Programming Language.svg

The AWK command is an interpreted programming language that runs in the Linux terminal. Its name comes from the initials of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. AWK was originally designed to handle files structured with delimiter-separated fields, such as CSV files or /etc/passwd files. However, AWK can also manipulate more complex text files, such as HTML or XML files. AWK is not an object-oriented programming language , but it allows the definition of local and global functions and variables. It also has control structures such as loops and conditional statements.

The general syntax of the AWK command is as follows:

awk [options] 'program' [files]

The program is a sequence of instructions that define patterns to search for in each line of the file and actions to perform when a pattern is found. Options allow you to modify the behavior of the AWK command, such as choosing the field delimiter or the output format.

How do I print text using the AWK command?

How to Use the Awk Command on Linux

The AWK command can be used to print a message to the terminal based on a pattern in the text. If you run the AWK command without any pattern and just a print command, AWK will print the message every time you press Enter.

For example, if you type:

awk '{print "Hello"}'

And if you press Enter several times, you get:

Hello Hello Hello

To stop the AWK command, you can press Ctrl+C.

If you want to print the contents of a file using the AWK command, you can use the special pattern `BEGIN` , which executes before reading the file, and the special pattern `END` , which executes after reading the file. For example, if you have a file named `test.txt` that contains:

This is a test. AWK is a great tool. Linux is the best operating system

You can print the contents of the file using the following command:

awk 'BEGIN {print "Here is the content of the file test.txt:"} {print} END {print "End of file"}' test.txt

Which gives:

Here is the content of the test.txt file: This is a test. AWK is a great tool. Linux is the best operating system. End of file

{print} command without arguments prints the entire line. You can also print a specific field using the variable $n , where n is the field number. By default, fields are separated by spaces or tabs, but you can change the delimiter with the -F .

For example, if you want to print the first and third fields of the /etc/passwd , which are separated by colons ( :) , you can use the following command:

awk -F: '{print $1 " " $3}' /etc/passwd

Which gives something like:

root 0 daemon 1 bin 2 sys 3 sync 4 games 5 man 6 lp 7 mail 8 news 9 uucp 10 proxy 13 www-data 33 ...

You can also print arithmetic expressions or strings using the AWK command. For example, if you want to print the square of the second field in the file test.txt , you can use the following command:

awk '{print $2^2}' test.txt

Which gives:

is 16 is

If you want to print the number of lines in the file test.txt , you can use the special variable NR , which contains the current line number. For example, you can use the following command:

awk 'END {print NR}' test.txt

Which gives:

3

How to filter text using the AWK command?

dl.beatsnoop.com thumb 1687289446

The AWK command can be used to filter text based on patterns or conditions. If you specify a pattern before an action, AWK will only execute the action if the pattern is found in the line. The pattern can be a regular expression, a comparison, a logical operation, or a combination of these.

For example, if you want to print the lines from the test.txt that contain the word Linux , you can use the following command:

awk '/Linux/ {print}' test.txt

Which gives:

Linux is the best operating system

If you want to print the lines in the /etc/passwd that have a UID greater than 1000, you can use the following command:

awk -F: '$3 > 1000 {print}' /etc/passwd

Which gives something like:

systemd-coredump:x:997:997:systemd Core Dumper:/:/usr/sbin/nologin tss:x:131:142:TPM software stack,,,:/var/lib/tpm:/bin/false _rpc:x:132:65534::/run/rpcbind:/usr/sbin/nologin statd:x:133:65534::/var/lib/nfs:/usr/sbin/nologin libvirt-qemu:x:64055:139:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin libvirt-dnsmasq:x:134:144:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/usr/sbin/nologin snapd-range-524288-root:x:524288:524288::/nonexistent:/bin/false snap_daemon:x:584788:584788::/nonexistent:/bin/false ...

You can also use the logical operators && (and), || (or), and ! (not) to combine patterns. For example, if you want to print the lines in the /etc/passwd that have a UID greater than 1000 and a shell other than /usr/sbin/nologin , you can use the following command:

awk -F: '$3 > 1000 && $7 != "/usr/sbin/nologin" {print}' /etc/passwd

Copy

Which gives something like:

tss:x:131:142:TPM software stack,,,:/var/lib/tpm:/bin/false _rpc:x:132:65534::/run/rpcbind:/usr/sbin/nologin statd:x:133:65534::/var/lib/nfs:/usr/sbin/nologin libvirt-qemu:x:64055:139:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin libvirt-dnsmasq:x:134:144:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/usr/sbin/nologin snapd-range-524288-root:x:524288:524288::/nonexistent:/bin/false snap_daemon:x:584788:584788::/nonexistent:/bin/false ...

How to edit text using the AWK command?

dl.beatsnoop.com thumb 1687289615

The AWK command can be used to modify text using built-in functions or special variables. For example, if you want to replace spaces with hyphens in the file test.txt , you can use the gsub , which replaces all occurrences of one string with another. You can also use the OFS , which defines the output field separator. For example, you can use the following command:

awk '{print strftime("%d/%m/%Y %H:%M:%S",$1 " " $2)}' test.txt

Which gives:

30/10/2021 16:13:49
31/10/2021 17:14:50
01/11/2021 18:15:51

You can consult the AWK command manual to learn about other available functions and variables.

How to use the for loop with the AWK command?

The AWK command can be used to perform for loops on the fields or lines of a file. The syntax of the for loop is as follows:

for (variable in array) action

Where variable is the name of the variable that successively takes the values ​​from the array , and action is the action to be performed at each iteration.

For example, if you want to print the fields of a file in reverse order, you can use a for loop with the special variable NF , which contains the number of fields in the current line. For example, if you have a file called test.txt which contains:

This is a test. AWK is a great tool. Linux is the best operating system

You can reverse the order of the fields with the following command:

awk '{for (i=NF; i>0; i--) print $i}' test.txt

Which gives:

One test is this formidable tool; one is AWK; the best operating system is Linux
dl.beatsnoop.com thumb 1687289757

You can also use the for loop to iterate through the lines of a file with the special variable FNR , which contains the line number of the current file. For example, if you want to print the even-numbered lines of the file test.txt , you can use the following command:

awk 'FNR%2==0 {print FNR}' test.txt

Which gives:

2
4

How do I run an AWK script?

To run an AWK script, you can place it in a file with the .awk and give it execute permissions with the command chmod +x . Then, you can run the script with the command ./script_name.awk [files] .

For example, if you have a script named hello.awk that contains:

#!/usr/bin/awk -f BEGIN {print "Hello"}

You can run the script with the following command:

./hello.awk

Which gives:

Good morning

How do I pass arguments to an AWK script?

dl.beatsnoop.com thumb 1687289922

To pass arguments to an AWK script, you can use two methods:

  • The first method involves using the -v with the format variable=value . For example, if you want to pass two arguments named var1 and var2 to your hello.awk , you can use the following command:
awk -v var1=hello -v var2=world -f hello.awk 

And in your hello.awk , you can access the arguments using the variables $var1 and $var2 . For example, if your script contains:

#!/usr/bin/awk -f BEGIN {print $var1 " " $var2}

You get:

hello world
  • The second method involves using the special ARGV , which contains the arguments passed to the script. For example, if you want to pass two unnamed arguments to your hello.awk , you can use the following command:
awk -f hello.awk hello world 

And in your hello.awk , you can access the arguments using the indices ARGV[1] and ARGV[2] . For example, if your script contains:

#!/usr/bin/awk -f BEGIN {print ARGV[1] " " ARGV[2]}

You get:

hello world

FAQ

What is the difference between AWK and GAWK?

GAWK is a GNU implementation of AWK, which adds additional features to the original language, such as support for extended regular expressions, multidimensional arrays, or predefined functions.

How do I debug an AWK script?

-W option with the lint , which displays warning messages about potential errors in the script. You can also use the -W with the dump-variables , which displays the values ​​of variables at the end of the script's execution.

How do I use the AWK command to sort data?

To use the AWK command to sort data, you can use the `sort` in combination with AWK. For example, if you want to sort the users in the `/etc/passwd` by their UID, you can use the following command:

awk -F: '{print $1, $3}' /etc/passwd | sort -n -k2

By combining AWK with other commands (here sort ), you can easily go much further in displaying and organizing data.

How do I print the number of words in a file using the AWK command?

To print the number of words in a file using the AWK command, you can use the special variable NF , which contains the number of fields in the current line, and the special variable NR , which contains the current line number. Using a for loop, you can count the number of words in each line and add them to a variable called total . Using the special pattern END , you can print the final result. For example, if you have a file named test.txt that contains:

This is a test. AWK is a great tool. Linux is the best operating system

You can print the number of words in the file using the following command:

awk '{for (i=1; i<=NF; i++) total++} END {print total}' test.txt

Which gives:

9

How do I use the AWK command to extract data from a CSV file?

To use the AWK command to extract data from a CSV (comma-separated values) file, you can use the -F to set the field separator to a comma. For example, if you have a file named test.csv that contains:

Name, first name, age: Alice Dupont, 25; Bob Martin, 32; Charles Durand, 28

You can extract the name and age of people using the following command:

awk -F"," '{print $1 " " $3}' test.csv

Which gives:

Name, age: Alice 25, Bob 32, Charles 28

How to filter data using the AWK command?

The AWK command allows you to filter data based on patterns, which are regular expressions or logical conditions. Patterns are placed before actions, separated by curly braces. For example, if you want to display the lines in the file test.csv that contain the name Alice , you can use the following pattern:

awk -F"," '/Alice/ {print}' test.csv

Which gives:

Alice, Dupont, 25

If you want to display the lines in the test.csv that are older than 30 years, you can use the following pattern:

awk -F"," '$3 > 30 {print}' test.csv

Which gives:

Bob, Martin, 32

You can combine multiple patterns using the logical operators && (and), || (or), and ! (not). For example, if you want to display lines in the file test.csv that have a name starting with C or an age less than 10 years old, you can use the following pattern:

awk -F"," '($1 ~ /^C/) || ($3 < 10) {print}' test.csv

Which gives:

CAMILLE,M,7 CLARA,F,11 CLEMENT,M,7

How to calculate statistics using the AWK command?

The AWK command allows you to calculate statistics on the numerical data in a file, such as the sum, average, minimum, or maximum. To do this, simply use variables to store the intermediate values ​​and update them with each line. By using the special END , you can display the final result. For example, if you want to calculate the sum and average of the ages in the file test.csv , you can use the following program:

awk -F"," 'NR>1 {sum+=$3; count++} END {print "Sum: " sum; print 'Average: 'sum/count}' test.csv

Which gives:

Total: 110 Average: 18.3333

Explanation:

  • -F"," option is used to define the field separator as a comma.
  • We use the condition NR>1 to ignore the first line of the file, which contains the column names.
  • We use the variables sum and count to accumulate the sum and number of ages. We use the operator += to increment the variables with the value of the third field ( $3 ).
  • END pattern is used / operator is used to calculate the average by dividing the sum by the number.

Similarly, if you want to calculate the minimum and maximum ages in the test.csv , you can use the following program:

awk -F"," 'NR>1 {if (min=="") min=max=$3; if ($3<min) min=$3; if ($3> max) max=$3} END {print "Min: " min; print 'Max: 'max}' test.csv

Which gives:

Min: 6 Max: 32

Explanation:

  • -F"," option is used to define the field separator as a comma.
  • We use the condition NR>1 to ignore the first line of the file, which contains the column names.
  • We use the variables min and max to store the minimum and maximum ages. We initialize these variables with the value of the third field ( $3 ) if they are empty ( "" ). We use the operators < and > to compare the values ​​and update the variables if necessary.
  • The END to display the final result.

Conclusion

The AWK command is an essential tool for manipulating text in Linux. It allows you to perform complex tasks in just a few lines of code, such as extracting, filtering, modifying, or calculating data. It offers great flexibility thanks to its patterns, actions, functions, and variables. It can be combined with other Linux commands to extend its capabilities. For example, you can use the Echo command on Windows to display a message or a variable on the screen. If you want to learn more about the AWK command, you can consult the manual or the many tutorials available online.

Previous article: How to secure your file transfers with FTPS on Windows Server?
Next article: How to get YouTube Premium for free?
Hey there, it's François :) A writer in my spare time who loves sharing his passion: all things tech! 😍 Whether it's hardware, software, video games, social media, or so many other areas, you'll find it all on this site. I share my analyses, reviews, tutorials, and my favorite finds across various platforms. I'm a knowledgeable and discerning tech enthusiast who doesn't just follow trends, but strives to guide you toward the best solutions. So stay tuned!