Command Notes for Script

January 15, 2025

These are command notes for scripting in linux.

Removing Duplicate Lines

Removing duplicate lines can be done by sord and uniq command.

$ sort textfile | uniq

Above approach is sorting first and filter unique line. The problem is that uniq command only check next line is identical to current line and skip if so. Therefore, sorting first is required. What if sorting is not desired? In case of that, awk command can be used. The clever part is '!_[$0]++' in below command.

$ awk '!_[$0]++' textfile

Using Variable in awk and sed

Using variable in awk command line is different from others. In below command -v VAR=apple '$0 ~ VAR' is key.

$ awk -v VAR=apple '$0 ~ VAR' textfile

It will print all line including apple that matched to VAR by awk. The option -v means assigning variable and $0 means the whole line. You can print specific field like $3 or $1, etc.

But using variable in sed command is different. Using double quotes and curly brackets like "s/${VAR}/replace/" is key.

$ VAR=apple
$ sed "s/${VAR}/orange/" textfile

It will replace apple with orange. If it is to replace all apples in line use g at the end of substitution statement like "s/${VAR}/orange/g".

Removing Duplicate Blank Lines

Removing duplicate blank lines by using sed command is a little tricky. The key part is '/^$/N;/^\n$/D'.

$ sed '/^$/N;/^\n$/D' textfile

Explanation: /^$/ is regular expression to match empty line and N appends new line to the matched line. And /^\n$/ regular expression matches lines containing only a newline character, which can happen after combination lines with N when there are consecutive empty lines, then D deletes the first line in the pattern space. If more content remains in the pattern space, it starts processing again from the top. This looping behavior continues until the pattern space is empty.

Search This Blog

mshii