HomeFAQStatisticsVariousContact

Anti-spam : how to get rid of spam bots

They are evil, stupid and love to crawl websites in order to steal and swallow every single email they can find : let's please and feed them !

Introduction :

Spammers use spam bots to steal your email(s). They exist in all possible forms from simple command line scripts to programs with UI. Most of them will use search engines to find HTML pages, extract emails and follow any link to other websites and so on. They can easily and quickly build large databases of valid email addresses where to send their spams to.
Although some bots are more sophisticated than others, they all have one thing in common : they are just programs and thus can be trapped, fooled without to much difficulty.

The following Perl script does that job with just an average of 30 lines of code and can be used on any server having apache mod_rewite module activated (even simple web hosting plans). It can be setup by adding few lines to apache configuration file (vhost) or simply with an .htaccess file created in your HTTP root directory. It could also easily be adapted to work with mod_security.

Overview:

The script works like this :

  • the aim is to force the spam bot to call HTML pages that do not exist. These page have an extension different from all other pages found on your website : we will choose the *.htm extension. It is important that no pages on your site use it. If this were the case, you can choose another extension by changing it in the script ($extension variable).

  • the name of these pages will always be different and randomly created by the script.

  • each time one of these pages will be called, it will trigger the script and trap the bot.

  • it is not necessary to rename the Perl script, since the spambot will never know its name (actually it will not even know that there is a script behind that !). That's a very interesting feature of such a redirection, your server will always return a 200 HTTP code even though the page does not exist.

  • each page called will generate a certain amount of fake email addresses and links to other pages that the bot will be pleased to steal and to follow.

  • to avoid endless loop, script can be setup to create a certain amount of fake pages links. This will also avoid suspicious bots from detecting the trap.

  • as a precaution, if someone called the script by mistake (ie by typing index.htm instead of index.html), a simple error page will be returned because the fake emails and links will be hidden in <DIV> tags rendered invisible to any browser by using the 'display: none' CSS instruction.

  • to attract spambots, we will include one or more links to the fake pages in the main index page. They should also be hidden from your visitors by using the 'display: none' CSS intruction. You can see such a link by looking at the HTML code of this page (at the bottom of the page).

  • to avoid crawlers (googlebot, yahoo slurp ...) we add meta-tags indicating they should not follow the fake links.
  • First of all, here's the result : [normal test (emails/links are hidden)] - [debugging test (emails/links are visible)]

    Setup :

    Using mod_rewrite you have 2 options : to include the instructions in your apache/vhost file (shared or dedicated servers), or to add them to a .htaccess (simple web hosting with CGI support).

    Example #1 apache configuration file (the 2 lines to be added are in red) :

      <VirtualHost xxx.xxx.xxx.xxx:80>
        DocumentRoot /home/www/domain.com/httpd
        ServerAlias  domain.com www.domain.com
        ServerName domain.com
        <Directory /home/www/domain.com/httpd>
          Options -Indexes +Includes
          ...
    
          # spam bot redirection to eatme.pl trap :
          RewriteEngine on
          RewriteRule \.htm$ /cgi-bin/eatme.pl
    
        </Directory>
        ...
      </VirtualHost>
    
    You must restart apache after the modification.

    Example #2 .htaccess file :

    Create a .htaccess file at the root of your website directory so that subdirectories, if any, will also trigger the script :

     home
     |--www
       |--.htaccess
       |--httpd
       |--cgi-bin
    
    Add the following lines to the .htaccess file :

      RewriteEngine on
      RewriteRule \.htm$ /cgi-bin/eatme.pl
    

    Adding fake links :

    On your main index page, create at least one (user invisible) link pointing to a fake .htm page.
    Example:

      <div style="display:none;">
       <a href="any_name_you_want.htm">click here</a>
      </div>
    
    That's it, you just need to copy the file eatme.pl to your /cgi-bin directory and chmod it 0755.

    The script :

    
    #!/usr/bin/perl
    
    ######################################################################
    #
    # [Script]: eatme.pl
    #
    # [Version]: v0.10  - 05-07-2007
    #
    # [Author]: Jerome Bruandet
    #
    # [Description]: spam bot trap
    #
    # [Requires]: apache + perl + mod_rewrite (or mod_security)
    #
    # [Installation]: see http://spamcleaner.org/en/misc/eatme.html
    #
    ######################################################################
    
    #======================= USER CONFIGURATION =========================#
    
    # alphabet letters used to create fake emails :
    @alphabet=('a','b','c','d','e','f','g','h','i','j','l','m',
               'n','o','p','r','s','t','u','v');
    
    # domain name extensions used for fake emails :
    @tld=('.net','.com','.org','.fr','.de','.eu');
    
    # fake page extension. By default, '.htm'.
    # if you want to change it, change also the mod_rewrite rule
    # in your apache conf or .htaccess file :
    $extension= ".htm";
    
    # Maximum number of characters to generate for the email name
    # and domain name :
    $max_car=13;
    
    # Maximum number of fake emails to create and display on each
    # fake page. Do not put a value greater than 200 as some bots
    # could become suspicious it is a trap :
    $max_emails=150;
    
    # number of links to other fake pages to display per page :
    $max_links=8;
    
    # number of fake page to generate : after having sent $max_pages
    # to the bot, the script will stop creating new ones. This helps
    # to avoid endless loop.
    # value should be less than 100 :
    $max_pages=10;
    
    # 404 error message to diplay to your visitors if, by mistake,
    # on of them trigger the script :
    $message_404="Error: page not found";
    
    #===================== END USER CONFIGURATION =======================#
    print "Content-type: text/html\n\n";
    
    print qq~<html><head>
    <meta name="robots" content="noindex,nofollow">
    </head><body>
    <h1>$message_404</h1>~;
    
    print "<div style='display:none;'>";
    
    $nb_letters=@alphabet;
    $nb_tld=@tld;
    
    # $max_pages already displayed ?
    $URL = $ENV{'REDIRECT_URL'};
    if ($URL=~/(\d{1,2})\Q$extension\E$/i){
      if ($1>=$max_pages){goto NO_PAGES}
      $count=$1;
    }
    $count++;
    # Create fake links/pages :
    $HOST= 'http://'.$ENV{'HTTP_HOST'};
    for (1..$max_links){
      $HTML_page=&genere_chaine."$count$extension";
      print "<a href=\"$HOST\/$HTML_page\">$HTML_page</a><br>\n";
    }
    print "<p>";
    NO_PAGES:
    # Fake emails :
    for (1..$max_emails){
      $user_name = &genere_chaine;
      $domain_name=&genere_chaine;
      $ext=$tld[int(rand($nb_tld))];
      print "<a href='mailto:$user_name\@$domain_name$ext'>$user_name\@$domain_name$ext</a> \n";
    }
    print "</div>";
    print "</body></html>";
    exit;
    ######################################################################
    sub genere_chaine{
      $chaine="";
      $nb_car= rand($max_car);
      $nb_car+=2 if $nb_car<4;
      for (1..$nb_car){
        $letter = int(rand($nb_letters));
        $chaine.=$alphabet[$letter];
      }
      return $chaine;
    }
    ######################################################################
    
    



                           





    < a h r e f="miam_miam.htm" rel="nofollow">emails addresses here, click me, you dumb bot!