| If
you notice entries like Teleport Pro and WebStripper in your
traffic reports, someone's been busy attempting to download
your web site. You don't have to just sit back and let this
happen. If you are commercially hosted, you'll be able to
add a couple of lines to your robots.txt file to prevent repeat
offenders from stripping your site.
The
robots.txt file gives search engine spiders and agents direction
by informing them what directories and files they are allowed
to examine and retrieve. These rules are called The Robots
Exclusion Standard.
To
prevent certain agents and spiders from accessing any part
of your web site, simply enter the following lines into the
robots.txt file:
User-agent:
Name of Agent
Disallow: /
Ensure
that you enter the name of the agent exactly as it appeared
in your reports/logs e.g. Teleport Pro/1.29 and that there
is a separate entry for each agent. Skip a line between entries.
You could do the same to exclude search engine spiders, but
somehow I don't think you'll really want to do this :0). The
"/" in the above example means disallow access to
any directory. You can also disallow access by spiders and
agents to certain directories e.g.
User-agent:
*
Disallow: /cgi-bin/
In
this example the asterisk (wildcard) indicates "all".
Don't use the asterisk in the Disallow statement to indicate
"all", use the forward slash instead.
If you don't have a robots.txt file, create one in notepad
and upload it to the docs directory (or the root of whichever
directory your web pages are stored in). Never use a blank
robots.txt file as some search engines may see this as an
indication that you don't want your site spidered at all!
Have at least one entry in the file.
Unfortunately, defining web stripper agents and spiders in
your robots.txt file won't work in all cases as some mirroring
software applications have the ability to mimic web browser
identifiers; but at least it's some protection that may save
you some valuable bandwidth.
If
you're not able to create a robots.txt file, which is usually
the case if you are hosted by a free hosting service, use
the robots exclusion meta tag on your pages.
|