Spam is one challenge of enabling comments on a WordPress site. This is one reason, albeit not the only reason, that I do not enable comments on The New Leaf Journal. However, I am using native WordPress comments for our newly-revived Guestbook (see about page). Even though we are not a high traffic site in the grand scheme of internet things, I received a solid number of spam comments almost as soon as the Guestbook went live. I had decided in advance to not use a plugin to deal with spam comments after reading several resources, including this very good article on random string comment spam by Mr. Jeff Starr.
My initial idea was to take advantage of WordPress’s built in functionality to create a list of stopwords based on the spam comments I was receiving (I am referring to “Disallowed Comment Keys” under Discussion in WordPress settings). While this idea was directionally correct, the implementation was all wrong. Why reinvent the wheel? It occurred to me that there must be a good list of stopwords already available. I ran a search on GitHub and came across a promising repository titled Comment Blacklist for WordPress. Sure enough, it has a huge, consistently updated list of stopwords. Below, I explain how to use this stopword list and keep it up to date without installing a plugin.
Since 2011, I have painstakingly identified, compiled, and optimized over 51,000 phrases, patterns, and keywords commonly used by spammers and comment bots in usernames, email addresses, link text, and URIs. As with all compilations, this blocklist is a work in progress and there will always be room for improvement and optimization.
The list of stopwords is contained in a blacklist.txt file in the repository. Mr. Hutchinson invites people to use the repository to submit suggestions, bug reports, and other feedback.
(Mr. Hutchinson discusses specific scenarios and limitations in the Readme. I recommend reading it before using the blacklist in your own project.)
The list of disallowed comment keys is regularly updated. I am drafting this section of the article on June 11, 2023. I counted 13 updates to the list since May 11, 2023. That the list is consistently updated raises the question about how to keep an individual WordPress site in sync. There are two methods to update the list. Firstly, you can copy the entire blacklist.txt file every time it is updated and paste it into the Disallowed Comment Keys field on your WordPress install. Secondly, and in the alternative, you can install a plugin which stays in sync with the GitHub repository. Mr. Hutchinson listed seven plugins (one is for a specific form builder, six are general-purpose) which handle this process automatically.
Below, I explain how to take the plugin-free approach, which has the additional benefit of being notified whenever Mr. Hutchinson makes a commit to blacklist.txt.
Using an ATOM feed to keep Disallowed Comment Keys up to date with blacklist.txt
Below, I demonstrate how to use the ATOM feed for blacklist.txt commits to stay on top of updates without relying on a WordPress plugin or a GitHub account.
Copy blacklist.txt from Comment Blocklist for WordPress
To use the blacklist on your own WordPress install without a plugin, the first step is to navigate to the blacklist.txt file and copy the entire list.
In the alternative, you can use a command line solution if you prefer that approach.
Paste the contents of blacklist.txt into Disallowed Comment Keys in WordPress
In your WordPress admin menu, navigate to Discussion, which is a sub-category of General. Scroll down until you see a text field next to the phrase Disallowed Comment Keys. Paste the entire contents of blacklist.txt into Disallowed Comment Keys.
In the alternative, you may want to exclude some of the Disallowed Comment Keys. For example, Mr. Hutchinson wrote the list for English-language sites, so there are many stopwords written in Cyrillic or the Chinese alphabet. If your site receives legitimate non-English comments in some of the languages on the list, you would at least want to review those stopwords. In this case, I would recommend using a plugin with configuration options (I have not tested any of the available plugins, so I do not know to what extent this feature may be supported) or forking the repository in GitHub.
Add ATOM feed for commits to blacklist.txt to your feed reader
To begin, if you are not familiar with feeds and feed readers, see my general introduction article. I will proceed with the assumption that you understand the concept of feeds and feed readers.
We return to the GitHub repository. We want to obtain the specific feed for commits to the blacklist.txt file. In this case, the feed URL is:
Now add this feed to your feed reader. I personally use the mPage extension for Firefox to stay on top of feeds for software updates and the like. This is a configurable single-page feed reader which only lists headlines with their associated links. See how it looks in my feed reader below:
There are many feed readers available, so use whichever you prefer to stay on top of updates to the blacklist.txt. Every time you see an update, you can copy the blacklist.txt file and replace the previous version in your Disallowed Comment Keys. However, I recommend reviewing the changes to the blacklist.txt file before updating in order to make sure that none of the updates would flag what may be legitimate phrases in comments that you receive.
I have decided to keep my Disallowed Comment Keys in sync with the blacklist.txt by subscribing to updates via ATOM, and this is a universal way of staying up-to-date that does not rely on a plugin or GitHub account. (I do have a GitHub account, however.) If you prefer to watch the issue on GitHub or install a WordPress plugin for a completely hands-off experience, those solutions would work as well. As an additional matter, I will note that many sites may not need to stay up to date with every update in order to catch most spam comments.
I look forward to monitoring my spam comments, and seeing how well the list works and if it comes up with any false positives. Do note if you are leaving Guestbook comments or comments anywhere else where I may enable them in the future that I will make an effort to review all comments in my trash bin before deleting them (I rescued our first comment in the original Guestbook from the spam bin – although that was marked by a different plugin). If you send legitimate content and do not see it published, feel free to contact me to make me aware of the issue.
Update for June 12, 2023.
The Comment Blacklist has sent nearly all spam Guestbook entries straight to the trash bin. However, a couple got through (about 10% thus far). I decided to fork the repository and add stopwords from spam comments that are not on the main blacklist. I will contribute some of those stopwords back upstream as suggestions. If you want to follow my fork, you can find it here.