Best Offer
Bidvertiser - PPC Search Engine
BidVertiser ReviewVisit BidVertiser
Our Choice List
Our Choice List
Free Advertising


Googlebot crawling fictional URLs in JavaScript

January 8, 2010
It's no secret that Googlebot is capable of crawling JavaScript files. In fact, Googlebot will take it that little bit further by trying to understand what the contents of JavaScript files are, in order to discover new webpages.

With Google evidently trying to increase the amount of content stored in its index, the situation sounds ideal for many webmasters - however, others are not so enthusiastic. Recently, it has been reported that Googlebot's intrusive crawling of JavaScript has resulted in the search engine creating fictional URLs - web addresses that do not exist. Analytics packages have been the single biggest contributors to this problem, including Google Analytics itself.

Example

function Tracking(){
zreturn("/services/submit/form/node/12345");


Google will see what it thinks is a new route (URL) to content, extract it and then apply it to the host its currently crawling - if these webpages don't exist, we have a problem.

If you run a tight ship and try to keep a hygienic set of Webmaster Tools, you will likely have spotted these fictional URLs being reported in the 404 Not Found section. Well, bigmouthmedia and its technical search boffins have devised a number of solutions to this problem, based on unique circumstances.

JavaScript join()

If you present a URL - or what looks like a URL - within inline JavaScript, Googlebot will likely try to crawl it or to make assumptions surrounding its existence.

Using the join() function to separate your URL within the page source should present a URL that Google is unlikely to crawl, which can be achieved using the code below (code sequence shortened):

Old: /Services/Submit...
New: '/'+['Services','Submit'....].join('/');


This is likely to be the best method of deterring Googlebot, as users will be left with the same functionality as the original URL.

Robots.txt

Depending on the pattern of 404 Not Found URLs being generated, you may be able to define a disallow rule within your robots.txt that will prevent Googlebot from crawling such pages. For example, if all URLs featured _node, you could wildcard disallow access to a URL that contains that string by using: Disallow: *_node*.

Be cautious when using robots.txt however, as you could end up blocking access to URLs that you would prefer were being promoted. Use the robots.txt Webmaster Tools checking abilities to ensure that this is still the case.

Dynamically Generate a File

You could also take the inline JavaScript out of your markup and dynamically generate a unique file with its contents. Googlebot could then be restricted from accessing such files using robots.txt - however, users are once again advised to tread with caution. It could be argued that dynamically generating files will increase the number of components required to load the page, thus increasing page load times that Google is introducing as a ranking factor, but we'll let you be the judge.

There you have it: three simple and not so simple ways to prevent Googlebot from crawling fictional URLs within inline JavaScript. Of the three, the JavaScript join() function appears to be the most effective.


Source:  bigmouthmedia.com

Review Search

search

PPC Advertising News

Mar 15, 2013
Social media has grown up

A new study has declared that social networks are no longer in their infancy and have graduated to a new phase of development.

Nov 16, 2012
Online technology to be looked at by the OFT

The Office of Fair Trading is set to examine how companies monitor the online habits and actions of shoppers.

News archive>>>

PPC Advertising Articles

Pay Per Click Advertising Foreword.
Pay Per Click search engine advertising today is the most effective tool for advertisers who want to involve new real clients, to increase a client base and raise company sales.

PPC Search Engine Advertising Basics.
Basic knowlege about PPC. Keywords and Ad structure, bidding strategy and targeting.

Keywords Selection for PPC Campaign.
Selection of keywords is very important part of creating Pay Per Click advertising campaign.
Our Choice   Our Choice
It is PPC Search Engines list which we have chosen as the best on set of parameters. In this list you can find only approved and authoritative companies with excellent quality of service.
PPC Bonus   PPC Bonus Offers
It is regularly updated list of bonus offers which help you to receive free bonuses for advertising your business. PPC Search Engines in this directory are sorted by quality of bonus offer.
Copyright © 2009-2014 www.PPC-Advertising-Guide.com - All rights reserved.