Regex & Text Processing
Process text with grep, sed, awk, and regular expressions.
grep - Pattern Matching
grep finds lines matching a pattern. Learn a few flags and you can debug systems fast (logs, configs, code).
$ grep "error" /var/log/syslog
$ grep -i "error" # Case insensitive
$ grep -r "TODO" ./src # Recursive search
$ grep -E "error|warning" # Extended regex
$ grep -i "error" # Case insensitive
$ grep -r "TODO" ./src # Recursive search
$ grep -E "error|warning" # Extended regex
Regex essentials (80/20)
- •
^and$: start/end of line - •
.: any character - •
* + ?: repetition (0+, 1+, optional) - •
[abc]: character class;[^abc]negation - •
(...)and|: groups and alternation (use-E)
# Lines that start with "sshd"
$ grep -E '^sshd' /var/log/auth.log | head
# Extract IPv4-looking patterns (basic)
$ grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' access.log | head
$ grep -E '^sshd' /var/log/auth.log | head
# Extract IPv4-looking patterns (basic)
$ grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' access.log | head
ripgrep (rg) for codebases
If you have rg installed, it’s faster and has great defaults for searching repositories.
$ rg "TODO" .
$ rg -n "PermitRootLogin" /etc/ssh
$ rg -n "PermitRootLogin" /etc/ssh
sed - Stream Editor
sed transforms streams. It’s perfect for quick rewrites in pipelines and safe in-place edits (use with caution).
$ sed 's/old/new/' file.txt # Replace first occurrence
$ sed 's/old/new/g' file.txt # Replace all
$ sed -i 's/old/new/g' file.txt # In-place edit
$ sed 's/old/new/g' file.txt # Replace all
$ sed -i 's/old/new/g' file.txt # In-place edit
# Print lines 10-20
$ sed -n '10,20p' file.txt
# Delete blank lines
$ sed '/^$/d' file.txt
# Strip comments (#...) and blanks from a config
$ sed -e 's/#.*$//' -e '/^[[:space:]]*$/d' app.conf
$ sed -n '10,20p' file.txt
# Delete blank lines
$ sed '/^$/d' file.txt
# Strip comments (#...) and blanks from a config
$ sed -e 's/#.*$//' -e '/^[[:space:]]*$/d' app.conf
awk - Text Processing
awk is a mini language for structured text. Think “columns + conditions + aggregation”.
$ awk '{print $1}' file.txt # First column
$ awk -F: '{print $1}' /etc/passwd # Custom delimiter
$ awk '$3 > 100' data.txt # Conditional
$ awk -F: '{print $1}' /etc/passwd # Custom delimiter
$ awk '$3 > 100' data.txt # Conditional
# Count requests per IP (common log format example)
$ awk '{print $1}' access.log | sort | uniq -c | sort -nr | head
# Sum column 3
$ awk '{sum+=$3} END {print sum}' data.txt
$ awk '{print $1}' access.log | sort | uniq -c | sort -nr | head
# Sum column 3
$ awk '{sum+=$3} END {print sum}' data.txt
Pipelines: combine tools
Most “real world” tasks are pipelines: filter → transform → aggregate.
# Top 10 URLs by count (nginx access log example)
$ awk '{print $7}' access.log | sort | uniq -c | sort -nr | head
$ awk '{print $7}' access.log | sort | uniq -c | sort -nr | head
✅ Practice (20 minutes)
- Pick a log file and extract unique IPs with
awk+sort+uniq. - Use
grep -Ewith anchors (^,$) to match exact config lines. - Use
sedto remove blank lines and strip comments from a config sample.