Ades Design - High Quality Website Templates, Free Website Templates, Tutorials: Photoshop, Dreamweaver, Flash, CSS and PHP


Blocking Bad Agents from downloading your site
Home » Tutorials » Other » Web Development


If you found this page searching Google or search engine, you must be looking for a solution how to block those bad guys from downloading your whole site. I know exactly how it feels to be ripped off, and I have done some research in this area and found few solutions that really work. I thought I would share this with you so that you could save your bandwidth from illegal downloaders.

Robots.txt
Robots.txt is the first choice and easiest way to block some of the agents, some of them are smart enough to rewrite this text file or ignore it totally. Nevertheless it works for some of them. To make a robots.txt is easy, open your Notepad type the following

User-agent: HTTrack
Disallow: /icons

Below is the screenshot.

And save as robots.txt and upload it to your root (main folder) of your website. It should in the same place with your index.html in the main folder. Let's go through the code and explain what does it mean and do. 1. User-agent: HTTrack It detects the agents and if the user agent is HTTrack (which is a website downloading tool) it will block it... 2. Disallow: /icons It will block this agent to access icons folder, which means HTTrack would not be able to download this (icons) folder.

When I was implementing robots.txt for this HTTrack software first I wanted to block it from accessing all folders so I did this.

User-agent: *
Disallow: /

But it was smart enough to say "Robots.txt is too restrictive, proceeding with download or something". So I had to single out all the folders like I did for the icons folder. Then I checked again, and it could not download those folders - all of them.

For further reading how to work with robots.txt go to http://www.robotstxt.org/wc/robots.html

Include script in header section of the page <head></head>
How this works, the idea is the same - to detect agents that download your whole site and allow the good ones to access your site. So you should be very careful when implementing any of this methods, if you wrongly put the agent like Google in the script then your website might be blocked from indexing by Google and would not come up in the Google search results. Here is the code to block the agents: Put the code below between your <head></head> tags in your page.

<?
$agent = $_SERVER['HTTP_USER_AGENT'];
if(($agent == "WebCopier v.2.2")||
($agent == "WebCopier v2.5")||
($agent == "WebZIP/5.0 PR1 (http://www.spidersoft.com)")|| {

header("Location: http://www.yoursite.com/no_download.html");
exit();
}
?>

What you need to change here is the code in green, it should be your website address. You should create a page called no_download.html or any other name. And when the agent is blocked it will be redirected to this page. You could put there a text that says downloading your site without your permission is illegal..etc. So that the person downloading your site has some feedback for his bad intention.

To add more agents check your website logs and identify which agents you should include in your script, after you have identified just copy and paste ($agent == "WebCopier v2.5")|| one line and change the name of the agent as it is shown in your logs.

 

Block through .htaccess
If you have access to your htaccess file you have one more option to block those bad agents. Put the following code together with your addtional agent names that you would like to block. Basically .htaccess is a text file that contains some useful information like sending user to 404.html page when the page is not found..etc If you don't have an access and don't know what is it, then contact your Administrator.

RewriteCond %{HTTP_USER_AGENT} ^WebZip [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus

RewriteRule ^.*$ no_download.html [L]

To add more agents, just add on more line RewriteCond %{HTTP_USER_AGENT} ^WebZip [OR] and change the name of the agent.

no_download.html is the page that has the legal notice or any other text you would like to display.

Lastly: Good Luck!



Web2.0 Bookmarks - Social bookmarking is good, but old skool way of hand-picking sites is better!
That's what we do best, collecting the coolest WEB2.0 sites. click here

 


Advertise | AdesBlog New | Free Templates | Premium Templates | Products | Services | Portfolio | Contact

2002 - 2007 Ades Design. All Rights Reserved. http://www.adesdesign.net

 









Logo Design, Web Design, Graphic Design (Cartoons, Caricatures etc.)
Software Development Company 
Search Engine Optimization by Design
Logo Design
 
Advertise Here