The meta robots tag was an open standard created over a decade ago
and designed initially to allow page authors to prevent page indexing. Over the
years, various search engines have added additional support to the tag.
Let me start off by saying that if you DO want
your pages in search engines, then DO NOT use the tag. By default, the major
search engines will index any page they find. Yes, there is a form of the meta
robots tag you can use to explicitly tell search engines to index your pages.
It looks like this:
<meta name=”robots” content=”index”>
There’s also a form you can use that adds the
command “follow,” which tells the search engines to index your page and also
follow any links they find on that page to other pages, which they can then
index. It looks like this
<meta name=”robots”
content=”index,follow”>
You do NOT need to use either form if you DO
want your pages in the search engines. Without either form, they’ll naturally
index your pages and follow your links. That’s what they do.
I always joke that putting these forms of the
meta robots tag on your web pages is like putting a Post-It note on your chest
that says “breathe.” Hey, if you forget to look at that note, you’ll still
breathe. That’s what you do, by default. And that’s what the major search
engines do. By default, they inhale web pages without you putting up a meta tag
telling them to do so.
Now if you DO NOT want your pages in a search
engine, then it’s time to perhaps break out the meta robots tag, if for some
reason the robots.txt alternative isn’t suitable. Want to keep a
particular page out? Then put this on that page:
<meta name=”robots” content=”noindex”>
See the “noindex” value? That tells the search
engines that see this page not to include them in their listings. Remember — as
I explained before — this will not prevent the pages from being spidered. That’s because
search engines have to keep revisiting the page in order to see if the tag is
removed. The tag only keeps the page out. Here’s my earlier chart on that
topic.
|
System
|
|||
|
Stops Crawling
|
Yes
|
No
|
No
|
|
Stops Index Inclusion
|
Yes
|
Yes
|
Yes
|
|
Stops Link Only Listing
|
No
|
No
(Yes, for Google) |
Yes
|
|
Why Use?
|
Easy to block many pages at once
|
Can’t access root domain
|
Don’t even want URL to appear or need page out fast
|
What if you don’t want links followed? Sure, you
can do this:
<meta name=”robots”
content=”noindex,nofollow”>
That extra command, “nofollow,” tells the search
engines not to follow any links on that page. Google recently covered this more as an option. But as Google
also explained, links from a page with this tag might still get crawled. That’s
because if anyone else links to a particular page WITHOUT a nofollow value,
then the search engine will follow that link.
So far, I’ve covered all the commands that were
originally created with the tag back in May 1996. Since then, more commands (also called values or
attributed) have been added. For example, Google writes today to summarize
several options you can use. Quoting Google:
- NOINDEX – prevents the page
from being included in the index.
- NOFOLLOW – prevents Googlebot
from following any links on the page. (Note that this is different from
the link-level NOFOLLOW attribute, which prevents Googlebot from following an individual link.)
- NOARCHIVE – prevents a cached
copy of this page from being available in the search results.
- NOSNIPPET – prevents a
description from appearing below the page in the search results, as well
as prevents caching of the page.
- NOODP – blocks the Open
Directory Project description of the page from being used in the
description that appears below the page in the search results.
At times, you may want to use more than one of
these commands. I’ll get back to that. But first, how about another chart? I’ll
cover the major commands you may want to use below:
|
COMMAND
|
Ask
|
Google
|
Microsoft
|
Yahoo
|
|
NOINDEX
|
||||
|
NOFOLLOW
|
||||
|
NOARCHIVE
|
||||
|
NOODP
|
No
|
|||
|
NOYDIR
|
No
|
No
|
No
|
|
|
NOSNIPPET
|
No
|
No
|
No
|
|
|
Robot
Name |
TEOMA
|
GOOGLEBOT
|
MSNBOT
|
SLURP
|
|
Does Robot Specific Tag Override All Robots Tag?
|
???
|
No
|
No
|
No
|
Several of these are already explained above, in
what I quoted from Google. They work the same way for the other major search
engines. I’ve also linked to help information from each search engine for more
specific advice.
The NOYDIR command is fully explained in my
previous Yahoo Provides NOYDIR Opt-Out Of Yahoo Directory Titles
& Descriptions post. Only Yahoo
supports this, but none of the other major search engines used Yahoo titles and
descriptions for listings, so it doesn’t really matter for them.
Now on to the topic of a meta robots tag having
multiple values. What if you wanted to keep a page from being cached by all the
major search engines and also ensure that neither Open Directory or Yahoo Directory
descriptions are used. First, you need the values of the commands to say this.
From the table above, they are:
- NOARCHIVE
- NOODP
- NOYDIR
Next, you need to decide what robots to target.
We’ll keep it simple for now. To target ALL robots, you use this value:
- ROBOTS
Now to the meta robots format. Without the
values, it looks like this:
<meta name=”NAME-OF-ROBOTS-TO-TARGET”
content=”COMMANDS”>
We replace that NAME-OF-ROBOTS-TO-TARGET part
with the name of the robots we’re, well, targeting. As explained, that’s
ROBOTS, in order to target them all. I’ll put it in bold below:
<meta name=”ROBOTS” content=”COMMANDS”>
Now we put in the commands we want to tell the
robots, each separated by a command. The order doesn’t matter. Again, I’ll bold
the commands:
<meta name=”ROBOTS” content=”NOARCHIVE,NOODP,NOYDIR“>
Voila! Put that tag ANYWHERE inside the header
area of a web page like this:
<HEAD>
<meta name=”ROBOTS” content=”NOARCHIVE,NOODP,NOYDIR”>
</HEAD>
<meta name=”ROBOTS” content=”NOARCHIVE,NOODP,NOYDIR”>
</HEAD>
Then you will be telling all major search
engines not to cache the page, nor to use Open Directory or Yahoo Directory
titles or descriptions for you page listings.
Notice that in the tag above, there are no
spaces between the commands. What if I did this?
<meta name=”ROBOTS” content=”NOARCHIVE,
NOODP, NOYDIR”>
Google writes today that spaces make no
difference. Use them if you want or not, the tag means the same thing.
Microsoft tells me the same thing, as does Yahoo.
What if you did this, with no commas:
<meta name=”ROBOTS” content=”NOARCHIVE NOODP
NOYDIR”>
Microsoft tells me this is fine. I didn’t ask
Yahoo about this, and Google says commas MUST be used. So use commas and don’t
be a pain.
Now what if you want to tell search engine
different things. Maybe you want Microsoft not to use the ODP descriptions,
Google not to cache pages, Yahoo not to follow links on a page and Ask not to
index the page at all. Maybe you want to get your head examined for being so
strange, too. But aside from your mental health, it is possible to do all this.
You need to have a robots tag for each
particular search engine you want to target. See that chart above? At the
bottom there’s a “Robot Name” row. That shows you the name of each search
engine’s “robot” or “spider” that you’ll issue a command to. With the robot
names, we then give each of them their specific commands:
<meta name=”TEOMA” content=”NOINDEX”>
<meta name=”GOOGLEBOT” content=”NOARCHIVE”>
<meta name=”MSNBOT” content=”NOODP”>
<meta name=”SLURP” content=”NOFOLLOW”>
<meta name=”GOOGLEBOT” content=”NOARCHIVE”>
<meta name=”MSNBOT” content=”NOODP”>
<meta name=”SLURP” content=”NOFOLLOW”>
You could also tell all robots to do one thing —
say not to follow links — while also issuing a second robots-specific command
such as telling only Google not to cache the page:
<meta name=”ROBOTS” content=”NOFOLLOW”>
<meta name=”GOOGLEBOT” content=”NOARCHIVE”>
<meta name=”GOOGLEBOT” content=”NOARCHIVE”>
But wouldn’t a search engine only follow the
specific tag written for it? In other words, if you target Google with a
specific command in the “GOOGLEBOT” tag, then wouldn’t it follow only that tag
and ignore the other?
Google, Microsoft and Yahoo say they will honor
them both. I don’t know about Ask. That’s why you see “???” in that “Does Robot
Specific Tag Override All Robots Tag?” section of the chart above. I’ll try to
get that answered.
What if you had more than one “all” robots tag
like this:
<meta name=”ROBOTS” content=”NOFOLLOW”>
<meta name=”ROBOTS” content=”NOODP”>
<meta name=”ROBOTS” content=”NOODP”>
As explained, you could easily do this instead:
<meta name=”ROBOTS”
content=”NOFOLLOW,NOODP”>
But if for some reason you did do it the other
way, Microsoft and Yahoo have told me that’s just fine. They honor the
information in BOTH of the robots tags. Google’s post today says the same
thing.
Finally, the Google post provides reassurance
that capitalization doesn’t make a difference. I’ve shown things in various
ways above, sometimes the commands in ALL CAPS, sometimes in lowercase. As
Google says, case makes no difference. To quote their post:
Googlebot understands any combination of
lowercase and uppercase. So each of these meta tags is interpreted in exactly
the same way:
<meta name=”ROBOTS” content=”NOODP”>
<meta name=”robots” content=”noodp”>
<meta name=”Robots” content=”NoOdp”>
<meta name=”robots” content=”noodp”>
<meta name=”Robots” content=”NoOdp”>
Ah, but what about something like this:
<MeTa nAMe=”RoBots” conTEnt=”NooDP”>
Well, Google didn’t go that far. But my
experience over the past decade has been that meta tags are not case sensitive
at all with the major search engines. So I think you’re safe in whatever case,
for all the major search engines.
RSS Feed
Twitter
23:12
A Braveheart
0 comments:
Post a Comment