I hate censorship as much as anyone, but as a web application developer, there are times when a banned-word list is necessary, especially if the application or site is geared towards younger users or corporate environments.
What This Script Does
The script will take a
word or phrase and replace the words you put in the
$badwords array and replace it with asterisks.
- Case insensitive
- Looks for "leetspeak"-style combinations of foreign characters, numbers and symbols
- Uses regex, so your badword list stays short
- Uses asterisks as the replacement, but you can specify your own character
- Add as many bad words as you like
I haven't finished all of the leetspeak filters yet,
but I have a good start so far. If you wish to twiddle those to add or remove, they can be found in the
censor.function.php file in the master branch, in the
How to Get It?
Just click on the tar.gz or .zip icons above to download the files. Or if you prefer, you can clone the repo:
$ git clone email@example.com:snipe/banbuilder.git
Check out the README in the repo to get started. It's as simple as including two files and invoking a function.
include('wordlist-regex.php'); include('censor.function.php'); $censored = censorString($input, $badwords);
This filter does not protect you against XSS or
SQL injection attacks, and never will, as that it not its purpose, and attempting to do so
could cause unpredictable results depending on how/where it's implemented. Read up on PDO,
mysql_real_escape_string(), the built-in
PHP sanitizing filters, and OWASP's guidelines
for data validation for more on this.
The Profanity Filter Conundrum
No banned-word list is going to be flawless. A G-rated list will block out the word "screw", but there are certainly legitimate uses for the word "screw".
The word "Dick" can be a crude reference to male genitalia, or to the nickname of a fellow named Richard. Context is the only way to tell the difference, and it's been argued that one cannot censor a language without actually comprehending it, since context is so critical.
If you put "ass" in your bad word array, legitimate words like "class" will be turned into "cl***", so choose your words wisely. This, and a lack of context-understanding, is a limitation of profanity filters in general and it isn't unique to this one. It is possible to create a whitelist of words on top of your blacklist, to specify legitimate words that might have an exactly matching swear word within it (like "assign", "classy", etc), but the creation and maintenance of that list would impractical, and running every string through it could increase processing time considerably.
"I want to stick my long-necked Giraffe up your fluffy white bunny"
In general, profanity filters just don't work. At least not the way we want them to.
"Obscenity filtering is an enduring, maybe even timeless problem. I'm doubtful it will ever be possible to solve this particular problem through code alone. But it seems some companies and developers can't stop tilting at that windmill. Which means you might want to think twice before you move to Scunthorpe."
- Jeff Atwood
And let's not forget that there are LOTS of ways to say horribly offensive, degrading and disrespectful things without ever using a single profane word. Check out this fantastic article on Habitat Chronicles for more.
But of course, there are times when we need to give a good best-effort to keep the obviously offensive stuff off of forums, leaderboards, and so on. And that's what this script does.
It perhaps goes without saying that someone who is really determined will find a way to post something awful, regardless of what profanity filter you use. You should know that walking in.
Your application and community management should be prepared on a way to address those issues quickly (for example, the ability to ban a repeat offender, audience-moderation such as content flags that will remove an entry if it's marked as flagged or offensive more than x time, etc.)