Command Notes for Script
These are command notes for scripting in linux.
Removing Duplicate Lines
Removing duplicate lines can be done by sord and uniq command.
$ sort textfile | uniq
Above approach is sorting first and filter unique line. The problem is that uniq command only check next line is identical to current line and skip if so. Therefore, sorting first is required. What if sorting is not desired? In case of that, awk command can be used. The clever part is '!_[$0]++' in below command.
$ awk '!_[$0]++' textfile
Using Variable in awk and sed
Using variable in awk command line is different from others. In below command -v VAR=apple '$0 ~ VAR' is key.
$ awk -v VAR=apple '$0 ~ VAR' textfile
It will print all line including apple that matched to VAR by awk. The option -v means assigning variable and $0 means the whole line. You can print specific field like $3 or $1, etc.
But using variable in sed command is different. Using double quotes and curly brackets like "s/${VAR}/replace/" is key.
$ VAR=apple
$ sed "s/${VAR}/orange/" textfile
It will replace apple with orange. If it is to replace all apples in line use g at the end of substitution statement like "s/${VAR}/orange/g".
Removing Duplicate Blank Lines
Removing duplicate blank lines by using sed command is a little tricky. The key part is '/^$/N;/^\n$/D'.
$ sed '/^$/N;/^\n$/D' textfile
Explanation: /^$/ is regular expression to match empty line and N appends new line to the matched line. And /^\n$/ regular expression matches lines containing only a newline character, which can happen after combination lines with N when there are consecutive empty lines, then D deletes the first line in the pattern space. If more content remains in the pattern space, it starts processing again from the top. This looping behavior continues until the pattern space is empty.
Comments
Post a Comment