| Home | FAQ | Statistics | Various | Contact |
Anti-spam : how to get rid of spam bots
They are evil, stupid and love to crawl websites in order to steal and swallow every single email they can find : let's please and feed them !
Introduction :
Spammers use spam bots to steal your email(s). They exist in all possible forms from simple command line scripts to programs with UI. Most of them will use search engines to find HTML pages, extract emails and follow any link to other websites and so on. They can easily and quickly build large databases of valid email addresses where to send their spams to.
The following Perl script does that job with just an average of 30 lines of code and can be used on any server having apache mod_rewite module activated (even simple web hosting plans). It can be setup by adding few lines to apache configuration file (vhost) or simply with an .htaccess file created in your HTTP root directory. It could also easily be adapted to work with mod_security.
Overview:
The script works like this :
First of all, here's the result : [normal test (emails/links are hidden)] - [debugging test (emails/links are visible)]
Setup :
Using mod_rewrite you have 2 options : to include the instructions in your apache/vhost file (shared or dedicated servers), or to add them to a .htaccess (simple web hosting with CGI support).
Example #1 apache configuration file (the 2 lines to be added are in red) :
Example #2 .htaccess file :
Create a .htaccess file at the root of your website directory so that subdirectories, if any, will also trigger the script :
Adding fake links :
On your main index page, create at least one (user invisible) link pointing to a fake .htm page.
The script :
Although some bots are more sophisticated than others, they all have one thing in common : they are just programs and thus can be trapped, fooled without to much difficulty.
<VirtualHost xxx.xxx.xxx.xxx:80>
DocumentRoot /home/www/domain.com/httpd
ServerAlias domain.com www.domain.com
ServerName domain.com
<Directory /home/www/domain.com/httpd>
Options -Indexes +Includes
...
# spam bot redirection to eatme.pl trap :
RewriteEngine on
RewriteRule \.htm$ /cgi-bin/eatme.pl
</Directory>
...
</VirtualHost>
You must restart apache after the modification.
home
|--www
|--.htaccess
|--httpd
|--cgi-bin
Add the following lines to the .htaccess file :
RewriteEngine on
RewriteRule \.htm$ /cgi-bin/eatme.pl
Example:
<div style="display:none;">
<a href="any_name_you_want.htm">click here</a>
</div>
That's it, you just need to copy the file eatme.pl to your /cgi-bin directory and chmod it 0755.
#!/usr/bin/perl
######################################################################
#
# [Script]: eatme.pl
#
# [Version]: v0.10 - 05-07-2007
#
# [Author]: Jerome Bruandet
#
# [Description]: spam bot trap
#
# [Requires]: apache + perl + mod_rewrite (or mod_security)
#
# [Installation]: see http://spamcleaner.org/en/misc/eatme.html
#
######################################################################
#======================= USER CONFIGURATION =========================#
# alphabet letters used to create fake emails :
@alphabet=('a','b','c','d','e','f','g','h','i','j','l','m',
'n','o','p','r','s','t','u','v');
# domain name extensions used for fake emails :
@tld=('.net','.com','.org','.fr','.de','.eu');
# fake page extension. By default, '.htm'.
# if you want to change it, change also the mod_rewrite rule
# in your apache conf or .htaccess file :
$extension= ".htm";
# Maximum number of characters to generate for the email name
# and domain name :
$max_car=13;
# Maximum number of fake emails to create and display on each
# fake page. Do not put a value greater than 200 as some bots
# could become suspicious it is a trap :
$max_emails=150;
# number of links to other fake pages to display per page :
$max_links=8;
# number of fake page to generate : after having sent $max_pages
# to the bot, the script will stop creating new ones. This helps
# to avoid endless loop.
# value should be less than 100 :
$max_pages=10;
# 404 error message to diplay to your visitors if, by mistake,
# on of them trigger the script :
$message_404="Error: page not found";
#===================== END USER CONFIGURATION =======================#
print "Content-type: text/html\n\n";
print qq~<html><head>
<meta name="robots" content="noindex,nofollow">
</head><body>
<h1>$message_404</h1>~;
print "<div style='display:none;'>";
$nb_letters=@alphabet;
$nb_tld=@tld;
# $max_pages already displayed ?
$URL = $ENV{'REDIRECT_URL'};
if ($URL=~/(\d{1,2})\Q$extension\E$/i){
if ($1>=$max_pages){goto NO_PAGES}
$count=$1;
}
$count++;
# Create fake links/pages :
$HOST= 'http://'.$ENV{'HTTP_HOST'};
for (1..$max_links){
$HTML_page=&genere_chaine."$count$extension";
print "<a href=\"$HOST\/$HTML_page\">$HTML_page</a><br>\n";
}
print "<p>";
NO_PAGES:
# Fake emails :
for (1..$max_emails){
$user_name = &genere_chaine;
$domain_name=&genere_chaine;
$ext=$tld[int(rand($nb_tld))];
print "<a href='mailto:$user_name\@$domain_name$ext'>$user_name\@$domain_name$ext</a> \n";
}
print "</div>";
print "</body></html>";
exit;
######################################################################
sub genere_chaine{
$chaine="";
$nb_car= rand($max_car);
$nb_car+=2 if $nb_car<4;
for (1..$nb_car){
$letter = int(rand($nb_letters));
$chaine.=$alphabet[$letter];
}
return $chaine;
}
######################################################################
![]()