Main | April 2007 »

October 2006 Archives

October 3, 2006

Search Engine Indexing and Re-Indexing is Not for the Faint at Heart

For the longest time, I have emphasized the importance of keywords, editorial content, etc for customers of my web development business, http://www.bestplainwebpages.com who need to optimize their position with search engines.

In fact, a mandatory preliminary requirement is getting a search engine to index a web site correctly, and at the right time. On top of this timing issue, you will need to have sufficient interest in the web site queued up (on the part of outside parties) to ensure that the search engine program will return to the site after the site is initially indexed, as you require, when new information is added to the site or the site contents change.

Meeting this preliminary requirement is a tough task. Once a site is "crawled" by a search engine indexing application, the only factor that will, most likely, lead to a re-crawling of the site is a volume of outside interest in the contents of the web site. To get this interest, you've got to get the incoming links to the site that everyone always talks about, but never really explains.

The incoming links that count aren't going to come from other search engine listings. They are going to come from honest interest on the part of "high rank" web sites in the information on your site.

You will either have the customer base to whom you can send an invitation to come to your site, or else you will need to build visits to the web site through Press Releases, or stories that you publish (or that are published in the name of your web business). In short, to get the exposure you will need to have the exposure. Chicken or the egg, which comes first?

Have a contingency plan, such as a Press Release campaign at the ready should the campaign not work out.

© Mike Blonder, 2006, All Rights Reserved

My, My have Blogs Grown Up

Getting a Feed Aggregator like http://www.technorati.com to distribute a blog feed is not as simple as was the case as recently as 6 months ago.

I am in the process of getting Technorati to distribute the feed for this blog. Not only have I had to prove to Technorati that I have, in fact, a Blog that I publish, but I have had to go further and prove that I am the owner of the Blog. I have opted to send them my Technorati Profile in lieu of sending them other information that I found to be confidential.

Worse yet, the cached version of my Blog went back a year ago to a different Blog. The ping feature of Movable Type seems to have fixed this cache issue.

As to Movable Type itself, the documentation on installation is not very helpful. As well, one of the most important features for me, the style of the pages themselves and my ability to easily change them as I require, is not very easy at all. The StyleCatcher application did not work as represented. In fact, for a period of time I lost any style at all for my pages. Thanks to my ISP, I got the issue resolved in fairly quick time once I called into the ISP.

© Mike Blonder, 2006, All Rights Reserved

October 4, 2006

Publishing Lots of Web Pages without Dynamic Pages

Dynamic pages (really just one HTML page that is filled with information as per an immediate request from someone at your site) is the approach most widely used for

  • web sites that require regular updates, or for
  • web sites that are produced by personnel who may be heavy on images and words, but light on computer programming

These dynamic pages are supported by databases. Most of the publishing work is actually done on the database side through commands written in Structured Query Language (SQL). These commands are embedded within the scripts that are invoked through tags on the dynamic HTML page.

I don't like this approach for several reasons and have chosen to go a different way with Best Plain Web Pages. The reasons are as follows:

  • Security: business customers typically use a database for other reasons. The same SQL calls can run on the customer database that is not facing the web, but may be accessible through a back door. Better not to have tags calling scripts with SQL calls.
  • Long URLs: Since the pages are dynamically published, information from different tables and/or rows of the database may be required to publish the same one page. The page address is written to reflect the various actions taken to publish the page, resulting in a long page address. Long URLs are not friendly for Search Engines.
  • So-called Short URL schemes don't work well: systems developers at the various Nuke Content Management Systems (CMSs) like PostNuke, have spent lots of time trying to modify the .htaccess file that the Apache WebServer uses to set up the browser for the page address with limited if any success. My experience has been that the number of Page Not Found errors (404 codes) goes up dramatically which, once again, negatively impacts on Search Engine rankings.

Best Plain Web Pages uses scripts that 1) do not run on the client site and 2) produce static HTML pages. I think this is a much better way to go. The scripts we use are written with Open Source components like VIM and Python. Any large-scale changes that we need to make on client sites are effected with VIM functions like argdo, or with tried and true simple programs including SED.

I'm happy to say that our primary client maintains very high Page Rankings for a site in a very competitive consumer market. Long live static HTML.

© Mike Blonder, 2006, All Rights Reserved

October 5, 2006

"Pay to Show"

The web has changed much, certainly since its beginnings in the late 1980s, and through its "boom town" phase in the mid 1990s. Back then, it was a rather easy process to get a web site indexed and promoted on the web by search engines, almost a "build it and they will come" (phrase is gratefully reproduced from the film "Field of Dreams") type of setting.

Now you can build it and "they" will come, but "they" will be no more than computer programs built to find blocks of text and images (along with the respective addresses for the blocks of text and images) on the Transmission Control Program/Internet Protocol (TCP/IP) network called the web. These programs will then collect and index that information without any regard for whether you're ready for them to do so, or not; therefore, test whatever you have locally before you publish on the web.

Also, you must understand that there is no correlation between these programs finding your block of text and images and useful hits from human beings on your web site. There is no imperative for anything whatsoever to happen. Most all of the Search Engine Optimization rules that you will find on the web are no longer the case; just keep this in mind.

If your site has been found too soon by these programs, then what has been indexed will not produce the results from searches that your site fundamentally needs to attract human beings.

Of course, if you are willing to pay someone to do something about it, then you may or may not get a search engine or two, or more to actually show your site, but that is another story to be told at another time.

Since "web 1.0" has now matured into a pure business play, "you get what you pay for." Let me cut to the core of this: blasting your address out to tons of search engines via so-called traffic building services is generally a waste of your precious cash. Better to spend your dough on Press Releases or a Blog to get some information out there, along with your web site address that will attract human beings to your site to review the information. If these humans like what they find and start to link to your pages, then you may start to get some promotion working for you through the search engines.

I work with this stuff every day through my web businesses, Best Plain Web Pages and Industrial Strength Ethernet and for my clients. I'm speaking from first hand experience. I review access logs on a regular basis and can tell a hit coming from a search engine query from a mile away. They are precious. Take it from someone who knows . . .

© Mike Blonder, 2006, All Rights Reserved

October 11, 2006

Acid Indigestion

A customer service representative at one of the top domain registrars in the U.S. tried to convince me this morning that none of the top search engines will place a web site, without a domain registered for at least 5 years, within the top 10 results for any search category.

A comment such as the above is indicative of how low web 1.0 has fallen. The fact is that web 1.0 has been completely consumed by pure business interests that appear to lack any interest in technology as anything other than a tool to generate revenue. There's nothing wrong with using technology to generate revenue, or even to argue that technology for business has no other purpose than to contribute to the job of making money; however, I argue that an intelligent appreciation for technology, and a real interest in it, will lead to lots and lots more money than the other approach. I like to make money too, but I understand the valuable role that technology can play to make my money-making job easier.

If this representative's comments are true, then we can conclude that the days have passed when entrepeneurs can safely consider the internet as a very low cost vehicle for starting a business. After all, pre-registering a domain name for 5 years is at least three, if not a full five times more expensive than pre-registering the same domain for a single year. Typically, any and all closely related names will be registered at the same time. The result can be a bill that approaches $500.00, or more, for nothing more than a five year registration for the domain names.

In my opinion, this representative was simply reading from a script. The registrar in question requires that customer service representatives push any/all customers to increase their term of domain name registration. Extending the registration term is an easy way to make more money off of the same account. It has not been my experience that the search engines are checking the whois registries for term of registration when rankings are set for web sites, but maybe I'm wrong.

On the other hand, I am noting the very disturbing fact that the search engine programs are indexing the information on the web sites but doing nothing with the information, apparently, unless/until the site owner spends some money to pay for ad clicks, or other types of ad campaigns. I say that I am noting this fact as I have reviewed raw access logs and witnessed sites being indexed, without updates on keywords, etc. I consider this practice to be very unfortunate.

The information indexed should be incorporated into a site profile for every site indexed, and not just for the smaller number of sites that buy advertising. If the search engines continue to go this route (a kind of "denial of information") web 1.0 will become a dinosaur as it will lose much of its value as the information repository that was the case in the past.

© Mike Blonder, 2006, All Rights Reserved

About October 2006

This page contains all entries posted to Mike Blonder: Thoughts on Technology, and the Web in October 2006. They are listed from oldest to newest.

April 2007 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33