Regex & Text Processing | Jungle Scripts

grep - Pattern Matching

grep finds lines matching a pattern. Learn a few flags and you can debug systems fast (logs, configs, code).

$ grep "error" /var/log/syslog
$ grep -i "error" # Case insensitive
$ grep -r "TODO" ./src # Recursive search
$ grep -E "error|warning" # Extended regex

Regex essentials (80/20)

•^ and $: start/end of line
•.: any character
•* + ?: repetition (0+, 1+, optional)
•[abc]: character class; [^abc] negation
•(...) and |: groups and alternation (use -E)

# Lines that start with "sshd"
$ grep -E '^sshd' /var/log/auth.log | head

# Extract IPv4-looking patterns (basic)
$ grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' access.log | head

ripgrep (rg) for codebases

If you have rg installed, it’s faster and has great defaults for searching repositories.

$ rg "TODO" .
$ rg -n "PermitRootLogin" /etc/ssh

sed - Stream Editor

sed transforms streams. It’s perfect for quick rewrites in pipelines and safe in-place edits (use with caution).

$ sed 's/old/new/' file.txt # Replace first occurrence
$ sed 's/old/new/g' file.txt # Replace all
$ sed -i 's/old/new/g' file.txt # In-place edit

# Print lines 10-20
$ sed -n '10,20p' file.txt

# Delete blank lines
$ sed '/^$/d' file.txt

# Strip comments (#...) and blanks from a config
$ sed -e 's/#.*$//' -e '/^[[:space:]]*$/d' app.conf

awk - Text Processing

awk is a mini language for structured text. Think “columns + conditions + aggregation”.

$ awk '{print $1}' file.txt # First column
$ awk -F: '{print $1}' /etc/passwd # Custom delimiter
$ awk '$3 > 100' data.txt # Conditional

# Count requests per IP (common log format example)
$ awk '{print $1}' access.log | sort | uniq -c | sort -nr | head

# Sum column 3
$ awk '{sum+=$3} END {print sum}' data.txt

Pipelines: combine tools

Most “real world” tasks are pipelines: filter → transform → aggregate.

# Top 10 URLs by count (nginx access log example)
$ awk '{print $7}' access.log | sort | uniq -c | sort -nr | head

✅ Practice (20 minutes)

Pick a log file and extract unique IPs with awk + sort + uniq.
Use grep -E with anchors (^, $) to match exact config lines.
Use sed to remove blank lines and strip comments from a config sample.