If you found this page searching Google or search engine, you must be looking for a solution how to block those bad guys from downloading your whole site. I know exactly how it feels to be ripped off, and I have done some research in this area and found few solutions that really work. I thought I would share this with you so that you could save your bandwidth from illegal downloaders.
Robots.txt
Robots.txt is the first choice and easiest way to block some of the agents, some of them are smart enough to rewrite this text file or ignore it totally. Nevertheless it works for some of them. To make a robots.txt is easy, open your Notepad type the following
User-agent: HTTrack
Disallow: /icons
Below is the screenshot.

And save as robots.txt and upload it to your root (main folder) of your website. It should in the same place with your index.html in the main folder. Let's go through the code and explain what does it mean and do. 1. User-agent: HTTrack It detects the agents and if the user agent is HTTrack (which is a website downloading tool) it will block it... 2. Disallow: /icons It will block this agent to access icons folder, which means HTTrack would not be able to download this (icons) folder.
When I was implementing robots.txt for this HTTrack software first I wanted to block it from accessing all folders so I did this.
User-agent: *
Disallow: /
But it was smart enough to say "Robots.txt is too restrictive, proceeding with download or something". So I had to single out all the folders like I did for the icons folder. Then I checked again, and it could not download those folders - all of them.
For further reading how to work with robots.txt go to http://www.robotstxt.org/wc/robots.html |