BOO Censorship!
I hate censorship as much as anyone, but as a web application developer, there are times when a banned-word list is necessary, especially if the application or site is geared towards younger users or corporate environments.
What This Script Does
The script will take a word or phrase and replace the words from a predefined list of forbidden words and replace it with asterisks (or whatever other character you might fancy.)
Features
- Case insensitive
- Looks for "leetspeak"-style combinations of foreign characters, numbers and symbols
- Uses regex, so your badword list stays short
- Uses asterisks as the replacement, but you can specify your own character
- Add as many bad words as you like
I haven't finished all of the leetspeak filters yet, but I have a good start so far.
How to Get It?
To install BanBuilder, simply include it in your project's composer.json
.
"snipe/banbuilder": "dev-master",
And then run composer update
.
There are no additional dependencies required for this package to work.
Usage
use Snipe\BanBuilder\CensorWords;
$censor = new CensorWords;
$string = $censor->censorString($yourstring);
This returns $string
as an array, where you can access $string['clean']
for the cleaned version of the $yourstring
, or $string['orig']
, which will give you the original $yourstring
.
Options
By default, this package uses an asterisk (*
) as the
replacement character, so that shit
becomes ****
.
You can change that by using the
setReplaceChar
method.
(Note that the symbol or letter you use must be one-character.)
use Snipe\BanBuilder\CensorWords;
$censor = new CensorWords;
$censor->setReplaceChar("X");
$string = $censor->censorString($yourstring);
Languages
The available language libraries are in src/dict
. We currently have:
- English - US (en-us)
- English - UK (en-uk)
- Spanish - Spain (es)
- Korean - South (kr)
- French - France (fr)
- Dutch - Netherlands (nl)
- Norwegian - Bokmål & various dialects - (no)
- German (de) - rudimentary
- Finnish (fi)
- Italian (it)
- Japanese (jp)
To choose a non-English language file (or several dictionaries at once),
pass the semantic filename without the .php
into the setDictionary
method call as a parameter. For example, to use the French dictionary of profanity, you would use:
use Snipe\BanBuilder\CensorWords;
$censor = new CensorWords;
$badwords = $censor->setDictionary('fr');
$string = $censor->censorString($yourstring);
To use multiple language dictionaries in once instance, pass the languages as an array:
use Snipe\BanBuilder\CensorWords;
$censor = new CensorWords;
$langs = array('fr','it');
$badwords = $censor->setDictionary($langs);
$string = $censor->censorString($yourstring);
Creating and Using Your Own Dictionaries
There are many reasons why you may want to create your own dictionaries instead of using ours. Ours may be too strict, or not strict enough, etc. If you'd like to create your own, create a new file for your dictionary, and make sure the words are listed in an array like this:
array_push($badwords,
'word1',
'word2',
);
Note: You should NOT put your custom dictionary files within the BanBuilder dict
directory. Composer vendor files are typically not checked into a project's source code, so your changes to your custom file will be ignored if you put them there. You can put them anywhere outside of the vendors directory.
To use your own version of a language dictionary, pass the path and filename in its entirety:
use Snipe\BanBuilder\CensorWords;
$censor = new CensorWords;
$badwords = $censor->setDictionary('/path/to/my/dictionary.php');
$string = $censor->censorString($yourstring);
Important!
This filter does not protect you against XSS or
SQL injection attacks, and never will, as that it not its purpose, and attempting to do so
could cause unpredictable results depending on how/where it's implemented. Read up on PDO,
mysql_real_escape_string()
, the built-in
PHP sanitizing filters, and OWASP's guidelines
for data validation for more on this.
The Profanity Filter Conundrum
No banned-word list is going to be flawless. A G-rated list will block out the word "screw", but there are certainly legitimate uses for the word "screw".
The word "Dick" can be a crude reference to male genitalia, or to the nickname of a fellow named Richard. Context is the only way to tell the difference, and it's been argued that one cannot censor a language without actually comprehending it, since context is so critical.
If you put "ass" in your bad word array, legitimate words like "class" will be turned into "cl***", so choose your words wisely. This, and a lack of context-understanding, is a limitation of profanity filters in general and it isn't unique to this one. It is possible to create a whitelist of words on top of your blacklist, to specify legitimate words that might have an exactly matching swear word within it (like "assign", "classy", etc), but the creation and maintenance of that list would impractical, and running every string through it could increase processing time considerably.
"I want to stick my long-necked Giraffe up your fluffy white bunny"
In general, profanity filters just don't work. At least not the way we want them to.
"Obscenity filtering is an enduring, maybe even timeless problem. I'm doubtful it will ever be possible to solve this particular problem through code alone. But it seems some companies and developers can't stop tilting at that windmill. Which means you might want to think twice before you move to Scunthorpe."
- Jeff Atwood
And let's not forget that there are LOTS of ways to say horribly offensive, degrading and disrespectful things without ever using a single profane word. Check out this fantastic article on Habitat Chronicles for more.
But of course, there are times when we need to give a good best-effort to keep the obviously offensive stuff off of forums, leaderboards, and so on. And that's what this script does.
It perhaps goes without saying that someone who is really determined will find a way to post something awful, regardless of what profanity filter you use. You should know that walking in.
Your application and community management should be prepared on a way to address those issues quickly (for example, the ability to ban a repeat offender, community-moderation such as content flags that will remove an entry if it's marked as flagged or offensive more than x time, etc.)
Getting Help
If you're stuck getting something to work, or need to report a bug, please post an issue in the Github Issues for this project.
Contributing
If you're interesting in contributing code to this project, clone it by running:
$ git clone git@github.com:snipe/banbuilder.git
Pull requests are welcome, but please make sure you provide unit tests to cover your changes.
Saying Thanks
If you're using this library, I'd love to know about it. Ping me on Twitter @snipeyhead. If it's made your life easier and you want to, you can buy me a beer with ChangeTip or Flattr!