Give me 10 minutes and I'll make you a REGEX expert.
|
This issue is brought to you by:
Warp is a fully fledged Agentic Development Environment. From prompt → production
“The IDE is dead. The ADE is in”. Coding tasks start with a prompt in Warp, not a heavyweight IDE that takes up 3/4 of your screen with code. Warp is free to try but for a limited time, try Warp Pro free for 7 days with 2,500 Al credits-no card required.
|
|
Regex has been around for years - seriously, over 70 (!!) but (or maybe - because of that) - it's considered the black magic of software devs.
An unbreakable spell, or at least, that's how I felt for the longest time.
Give me 10 minutes (I don't guarantee the time it takes to embed it into the brain 😉) and a bit of focus, and I’ll show you how to read, write, and actually enjoy regex.
The Big Problem: Messy Text Everywhere
Everything we work with is text: logs, configs, datasets, form inputs, random strings from APIs that look like somebody spilled coffee on a keyboard.
Trying to clean, match, or validate all of that manually is a nightmare.
For example, describing a proper email address in a template that catches *all* cases, or a password policy that enforces a certain length or complexity.
These, can all be described logically, kind of like a math expression, but for strings.
When it comes to search and replace, this can even be more useful - think of a 10,000 lines document where we need to change convention like from camel to snake case, and instead of hitting 7000 times the same sequence, we can do it with one simple command.
The Old Way: Click, Find, Replace, Repeat
Most people tackle this with brute force: CTRL+F, wild guessing, or a messy chain of “replace this, then that” commands.
Maybe a quick script, maybe a few lines of if statements to catch special cases.
It works once… until you realize you need to do it again, but slightly differently.
Regex sounds scary, so most people avoid it, assuming it’s some sacred black magic syntax only gray-bearded sysadmins understand (sorry, gray bearded sysadmins).
Why It Fails: “.*” Isn't a Strategy
Here’s the issue: without regex, you’re stuck reinventing the wheel every time text changes shape. And when people do try regex, they usually copy a mysterious pattern from Stack Overflow that either eats everything or matches nothing.
It’s not that regex doesn’t work, it’s that we never learn why those symbols mean what they mean.
This means you don't have the human skill to READ, DEBUG and UNDERSTAND this seemingly-black-magic, and like many other things, it cripples us.
What usually happens is that the task it offloaded to SOF or AI ending up in a future mem leak no one saw coming (true story).
Once you break regex into bite-sized ideas, it stops looking magical.
Anchors like ^ and $ just say “start here” and “end there.”
Dots mean “any one thing.” Brackets [] create little lists of allowed characters.
The plus + means “give me one or more,” while the star * means “any number, even zero.”
From there, you’re just mixing building blocks.
Think in Patterns, Not Words
Start small.
Don’t try to tame the beast.
Use regex for tiny problems that bug you every day:
- Find all phone numbers in a messy text log.
- Fix names written as “Doe, John” to “John Doe.”
- Validate emails without yelling at the user prematurely.
- Mask digits in logs before sharing them.
You’ll quickly realize regex isn’t a hack, it’s a language for describing patterns that text follows.
Try this one step today: open your terminal and run something fun with ripgrep.
For example, to find lines that start with “error”:
Then use sed or its modern cousin sd to make quick replacements:
sd 'cat' 'dog' myfile.txt
Boom. You just wielded regex in real life. no scripts, no loops, no magic.
Now let’s take those next few minutes and actually build up your regex muscle memory: all the way from baby steps to a real world usecase.
Anchors: Telling Regex Where to Look
Anchors are your way of saying “start here” or “end there.”
^ (called the caret) marks the beginning of a line, while $ marks the end.
If you search for:
^he
# you’ll only match text that starts with “he”, like “hello”, but not “oh hey”.
If you flip it and use:
he$
# you’ll only get words ending in “he”, like “ache”.
The Dot
A single dot . stands for “any one character.” So this:
c.t
# matches “cat”, “cut”, or even “c9t” — anything with one character between c and t.
# But ct won’t match, because dot means exactly one character (and we have 0).
Brackets: Pick and Choose
When you wrap characters in square brackets [ ], you’re saying “accept any of these.”
So, [ABC] means A, B, or C.
You can define ranges too: [A-C] is the same as above.
To exclude certain characters, put a caret ^ inside the brackets.
[^c]at
# finds any word ending in “at” as long as it doesn’t start with c — so “bat”, “hat”, or “mat”, but not “cat”.
Repetitions: Choosing How Many
Regex shines when text repeats in unpredictable ways.
Let’s talk about ?, +, and *. the holy trinity of repetition:
u? → match “u” zero or one time. Great for words like colour/color.
u+ → match one or more “u”s.
u* → match zero, one, or any number of “u”s. it’s the wildcard of wildcards.
Fun fact: that star you called “wildcard” you're whole life? Yep, comes from regex.
Character Shortcuts
Regex has shortcuts for common text categories like digits spaces and words.
\d = any digit (0–9)
\w = any “word” character (letters, digits, underscore)
\s = a space (including tabs and line breaks)
Each has a “negated” twin — the same letter but uppercase.
\D = anything that’s not a digit
\W = anything that’s not a word
\S = anything that’s not a space
So, to find multiple digits in a row:
\d+
To remove all spaces:
\s+
"OR":
Sometimes you want either one thing or another.
That’s what | (the pipe) is for.
cat|dog
# matches both “cat” and “dog”, no matter where they appear.
You can combine this with other tricks.
To make sure you also catch both versions of “color” and “colour”:
Want both “grey” and “gray”?
Email Validation: The Classic Showcase
Now, let’s take this into practical use.
Email validation is everyone’s first regex challenge and it’s surprisingly logical once you break it down.
Consider this expression:
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[a-z]{2,}$
Here’s how it works:
- ^[A-Za-z0-9._%+-]+ : The username part before @: only letters, digits, and select special characters, one or more times.
- @ : The literal “at” sign, exactly one.
- [A-Za-z0-9.-]+ : The domain name: letters, digits, dots, and dashes.
- \. : A literal dot (escaped so it’s not a wildcard).
- [a-z]{2,}$ : The domain ending: at least two lowercase letters, right before the end of the line.
If any of these conditions fail (like missing a dot in the domain, or putting an invalid sign in the username), the match stops immediately.
Try it with different cases - and try to find the reason for failure:
- “devops.toolbox@co” → ❌ fails, suffix too short
- “devops.toolbox@co.uk” → ✅ works perfectly (funny enough - the editor I'm using rn marks this as a valid clickable address too)
- “wrong@@sign.net” → ❌ invalid structure
Password Strength Validation: Regex final boss
Now you’re ready for the real deal: password sanitization.
Here’s what a strong-password regex might look like (simplified for readability, it was way worse):
^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[@$!%?&])[A-Za-z\d@$!%?&]{8,}$
Breakdown:
- ^ : Start of the line
- (?=.*[a-z]) : Must contain at least one lowercase
- (?=.*[A-Z]) : Must contain at least one uppercase
- (?=.*\d) : Must include at least one number
- (?=.[@$!%?&]) : Must include at least one special character from that list
- [A-Za-z\d@$!%*?&]{8,} : Only these characters allowed, eight or more total
- $ : End of line
Also note this bit: \S* : that’s often used to block whitespace.
So spaces in passwords? Instantly rejected.
This style of regex is known as “PCRE” Perl-Compatible Regular Expressions, the richest and most flexible family of regex syntax.
Replacements in Action: sed & sd
Now that you can find patterns, let’s change them.
sed is your old-school Unix friend for stream editing, while sd is its younger Rust-powered cousin that’s faster and more forgiving - and my personal choice.
Replace “hello” with “hi”, but only when “hello” starts a line:
Mask all digits in a line:
And flip a “Lastname, Firstname” into “Firstname Lastname”:
sd '([^,]+), (.+)' '$2 $1'
Remember: parentheses ( ) capture parts of what you matched.
You can reuse them later as $1, $2, etc., based on their order.
Quick Reality Check
Regex can become a bottleneck when it gets too clever.
Overly complex expressions might slow programs or even lock them if they loop badly, sometimes spiraling into full blown memory leaks.
So, keep things tidy, readable, and tested on real inputs.
Ninety percent of the time, simple patterns are all you need.
Regex isn’t about memorizing the syntax it’s about learning to see patterns inside chaos.
Once that clicks, using regex feels less like coding and more like cleaning up the world’s text one match at a time.
One last thing: regex has been around since 1951.
It powered early compilers, made its way into UNIX editors like ed, birthed grep, and spread everywhere from Perl to PostgreSQL.
So, yeah, it’s kind of the OG, and while things are made for a reason, a lot of water has gone through since and new flavors are appearing
If reading this today made you even a little more regex-fluent, congratulations. You’re now part of a 70-year-old tradition 😉.
Thank you for reading.
Feel free to reply directly with any question or feedback.
Have a great weekend!
Whenever you’re ready, here’s how I can help you:
|
|