Deny Spambots and Prevent Email harvesting


Submitted by Allthewebsites.org Administrator | Category: Programming | Published on Dec 03, 2003
 
Abstract:
Spambots are a nightmare to every webmasters. They waste more and more bandwidth of the website and bring us non-stop spam mails. Learn how to combat this spambot menace and junk traffic by using PHP, Javascript etc.,

Spambot Menace

Spambots are small spider programs let loose on the Internet by spammers to harvest email addresses on the web pages like newsgroup postings, discussion boards, guestbooks etc., They do not obey the robots.txt rule and request webpages like a beggar who has not eaten for months, there by exhausting megabytes of bandwidth of your web server within minutes. Their intention is to just get all email addresses, if found, on the webpages. Spambots can disguise themselves in many ways. Since they are programmed by the humans (SPAMMERs), they come in different flavors. It is very hard to keep track all of them. But we can prevent them harvesting emails by installing some scripts on the server.

How to prevent spambots accessing and harvesting your webpages?

If you are a web programmer and know any of the languages like PHP, ASP, etc., then, here is the code in PHP.

  
//filename: websiteguard.php 
//--------------------------------------------------------------//   
// Purpose: To deny access for spambots, spybots and other bad agents.
// When the useragent is a good one it allows, otherwise your 
// php page will stop working and 
// protects your website from badbots.
// Inputs: UserAgent string
// Author: Allthewebsites Webmaster [ webmaster AT allthewebsites DOT org ]
// Version: 1.0.0 
//---------------------------------------------------------------// 
$thisAgent  = $HTTP_SERVER_VARS["HTTP_USER_AGENT"]; 
//--- Call the function
  WebsiteGuard();
//---------------------------------------------------------------//
function WebsiteGuard()
 { 
  global $thisAgent;
  $isDenied = false; 
  if (preg_match("/webzip|httrack|wget|FlickBot|downloader|production
  bot|superbot|PersonaPilot|NPBot|WebCopier|vayala|imagefetch|
  Microsoft URL Control|mac finder|
  emailreaper|emailsiphon|emailwolf|emailmagnet|emailsweeper|
  Indy Library|FrontPage|cherry picker|WebCopier|netzip|
  Share Program|TurnitinBot|full web bot|zeus/i",$thisAgent))
  { 
	  $isDenied = true;
	  // Customize this message :-)
	  print("Do not disturb...Zzz...\n");
  	  exit();  
  }  
 } 
//--------------------------------------------------------------//  

How to use this script?
       just "include" the script at the top of every php page.

Advantages

  • It will allow good bots like googlebot, webcrawler, ia_archiver etc., to crawl your website and list your website in Search engines.
  • It prevents human users from downloading your entire website by using Offline browsers.
  • It does not utilize .htaccess, so this code is portable in almost all the platforms which can run PHP scripts (Windows, Unix, Linux etc).
  • Can be used in Apache (Linux, *nix ) and as well as IIS (Windows).
  • Simple and Light-weight.

Few limitations and disadvantages.

  • If the useragent is unknown or empty or not provided, it would allow the browser to view the webpage. So protect your emails by using Javascript method.
  • List of bad bots is not yet complete. It is just a partial list.
  • If all the available badbots are added, it would slightly degrade the performance of your script.
  • Will not prevent HTML (.html, .htm) pages from the badbots. Use .htaccess (Apache) or some other scripts.
  • Even IPs can be blocked. But that part of the code is not shown, since they are specific to each website.

Code in ASP or JSP?
Author found an ASP code at "PlanetSourceCode.com". Try searching there or use the javascript method listed below.

If you can not run php pages and all your webpages are plain HTML.

Do not post any plain vanilla email addresses like me@someXYZdomain.com
It is an easy target for Spambots.

1. Post email addresses like "your_email AT some DOT com". Some people use various methods like me@some.REMOVETHIS.com, MyMail@-NOSPAM-some other .com

Reason: All the humans (internet users) can understand that they have to replace AT with "@" and DOT with a "period" symbols. So your email will be only understood by humans and not by the spambots.
NOTE: This is just for plain webpages and NOT for web forms like contact form, registration form etc.,.

2. Make an image (gif / png / jpg ) of your email address and post it on the webpages.
     Reason: Spambots cannot understand images.

3. Protect your Guestbooks by Password. Spambots primarily target the guestbooks, formmail scripts etc to harvest emails. Use javascript method to prevent bad bots.

4. Obfuscate / Protect your email address by using javascript code. Spambots cannot understand# javascript code. (# if the spambot programmer is clever and diligent enough, he/she could even decipher your javascript code. It cannot be said as the safest method but it is safer than any plain text and allows only javascript enabled browsers to see the mailto link. And it would deter almost all common spambots from harvesting).

An Email Protection script is here at http://www.allthewebsites.org/emailprotector.php It converts ASCII code letters into hex code entities. It is far more superior than the simple javascript method. On your webpage it would be visible as nobody@someXYZdomain.com. But, when you look at the HTML source, you would only see like this...


%64%6f%63%75%6d%65%6e%74%2e%77%72%69%74%65%28
%27%3c%61%20%68%72%65%66%3d%22%6d%61%69%6c%74
%6f%3a%77%65%62%6d%61%73%74%65%72%40%73%6f%6d
%65%58%59%5a%64%6f%6d%61%69%6e%2e%63%6f%6d%22
%20%3e%77%65%62%6d%61%73%74%65%72%40%73%6f%6d
%65%58%59%5a%64%6f%6d%61%69%6e%2e%63%6f%6d%3c
%2f%61%3e%27%29%3b

Choose any or all of the listed solutions for your website to prevent bandwidth loss and protect your emails. Your comments and suggestions are most welcome.

 

Current Ratings: 9 by 37 visitors       Total Views : 15984
  How would you rate this article: 
  Bad           Good    
 » About the Author
Name: Allthewebsites.org Administrator
Details:
 » Related Links
http://www.allthewebsites.org/emailprotector.php , http://www.spamhaus.org  

Back to Articles

 

 

Network Sites