- Scan the files, line by line.
- Split each line into fields/columns.
- Specify patterns and compare the lines of the file to those patterns
- Perform various actions on the lines that match a given pattern
In this article, we will explain the basic usage of the awk command and how it can be used to split a file of strings. We have performed the examples from this article on a Debian 10 Buster system but they can be easily replicated on most Linux distros.
The sample file we will be using
The sample file of strings that we will be using in order to demonstrate the usage of the awk command is as follows:
This is what each column of the sample file indicates:
- The first column contains the name of employees/teachers in a school
- The second column contains the subject that the employee teaches
- The third column indicates whether the employee is a professor or assistant professor
- The fourth column contains the pay of the employee
Example 1: Use Awk to print all lines of a file
Printing each and every line of a specified file is the default behavior of the awk command. In the following syntax of the awk command, we are not specifying any pattern that awk should print, thus the command is supposed to apply the “print” action to all the lines of the file.
In this example, I am telling the awk command to print the contents of my sample file, line by line.
Example 2: Use awk to print only the lines that match a given pattern
With awk, you can specify a pattern and the command will print only the lines matching that pattern.
From the sample file, if I want to print only the line(s) that contain the variable ‘B’, I can use the following command:
To make the example more meaningful, let me print only the information about employees that are ‘professor’s.
The command only prints the lines/entries that contain the string “professor” thus we have more valuable information derived from the data.
Example 3. Use awk to split the file so that only specific fields/columns are printed
Instead of printing the whole file, you can make awk to print only specific columns of the file. Awk treats all words, separated by white space, in a line as a column record by default. It stores the record in a $N variable. Where $1 represents the first word, $2 stores the second word, $3 the fourth, and so on. $0 stores the whole line so the who line is printed, as explained in example 1.
The following command will print only the first column(name) and the second column(subject) of my sample file:
Example 4: Use Awk to count and print the number of lines in which a pattern is matched
You can tell awk to count the number of lines in which a specified pattern is matched and then output that ‘count’.
In this example, I want to count the number of persons teaching the subject “english”. Therefore I will tell the awk command to match the pattern “english” and print the number of lines in which this pattern is matched.
The count here suggests that 2 people are teaching english from the sample file records.
Example 5: Use awk to print only lines with more than a specific number of characters
For this task, we will be using the built-in awk function called “length”. This function returns the length of the input string. Thus, if we want awk to print only lines with more than, or even less than, the number of characters, we can use the length function in the following manner:
For printing lines with characters greater than a number:
For printing lines with characters less than a number:
Where n is the number of characters you want to specify for a line.
The following command will print only the lines from my sample file who have characters more than 30:
Example 6: Use awk to save the command output to another file
By using the redirection operator ‘>’, you can use the awk command to print its output to another file. This is the way you can use it:
In this example, I will be using the redirection operator with my awk command to print only the names of the employees(column 1) to a new file:
I verified through the cat commands that the new file only contains the names of the employees.
Example 7: Use awk to print only non-empty lines from a file
Awk has some built-in commands that you can use to filter the output. For example, the NF command is used to keep a count of the fields within the current input record. Here, we will use the NF command to print only the non-empty lines of the file:
Obviously, you can use the following command to print the empty lines:
Example 8: Use awk to count the total lines in a file
Another built-in function called NR keeps a count of the number of input records(usually lines) of a given file. You can use this function in awk as following to count the number of lines in a file:
This was the basic information you need to start with splitting files with the awk command. You can use the combination of these examples to fetch more meaningful information from your file of strings through awk.