Endikos

Archive for the ‘SEO’ Category

Sitemaps and Multilingual Websites

Tuesday, January 6th, 2009

[Note: This post has been moved to ThreeBit Media, my consulting website.]

When beginning the architecting of the site I’ve been working on I knew I would need to address two issues (among many others, but for now we’ll just cover these two): 1) How to structure a multilingual website physically, and 2) How to address the sitemap.xml structure for the site as a whole.

First I had to decide how the site should be physically structured.  Would a subdomain-per-language be good, e.g, en.mysite.com for English, es.mysite.com for Spanish, and ru.mysite.com for Russian?  Or would it be better to use directories for the distinction, e.g. www.mysite.com/en/ for English, etc?  If I chose the subdomain route it would be easy to build sitemap.xml files for each domain.  But how would I structure the sitemap.xml if using directories?

I chose to use directories for a couple of reasons.  One, I knew that google treats subdomains as entirely separate websites.  I didn’t wish to do this because semantically these were three translations of the same website, and I felt that should be reflected in their structure.  Two, I didn’t want to have multiple datasets when dealing with analytics, either multple log files to analyze or one of the myriad javascript-based analytics packages.  Yes, I’m fully aware that there are ways to glom datasets together, or otherwise make analytics packages aware of your structure… this was just pure personal preference.

OK, so now I have my structure in place, how do I build the sitemap.xml?  I don’t want one huge monolithic file for the entire site.  Even though at current count there are only around 100 html files per translation (not huge by any means, but also not insignificant), I would just personally prefer to keep the translations in their own separate sitemap.xml files.  Those of you familiar with sitemaps will have been shouting at your monitors by now “Use a sitemap index, dork!”, and you’d be right.  I just wasn’t sure that Google would support this.  Google didn’t seem to mention it anywhere in their webmaster tools documentation (though I could have just missed it).

I’m happy to report that Google does in fact support sitemap indexes, and I’m fairly certain that MSN and Yahoo! do as well. So, simply build yourself a sitemap_index.xml (the filename is arbitrary) file that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex
  xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.mysite.com/sitemap_en.xml</loc>
      <loc>http://www.mysite.com/sitemap_es.xml</loc>
      <loc>http://www.mysite.com/sitemap_ru.xml</loc>
   </sitemap>
</sitemapindex>

Then build your individual sitemap files as you normally would.  You can find the full specifications for sitemaps at sitemaps.org, and a nifty utility to help you automatically build sitemap files at the google-sitemap_gen project.  Dont forget to include your new sitemap index file in your robots.txt file!  Enjoy.

Linking to Internal Directories

Saturday, November 15th, 2008

[Note: This post has been moved to ThreeBit Media, my consulting website.]

I could just say “Use trailing slashes!” and be done with it.  But that would leave you, dear reader, underwhelmed and grumbly.  You may have already read this article on A List Apart regarding using trailing slashes.  In that article the author mentions three reasons for using trailing slashes when linking to directories (and I quote):

  1. We’re doing ourselves a favor, as this is the correct way to do things.
  2. We’re doing our server a favor, as this means less disk access.
  3. And most importantly, we’re doing our visitors a favor, because they’re no longer losing a few seconds while our server tries to find first a file and then a directory. And in this industry, you and I both know that a few seconds is a long, long time.

Now this article was written in 2002 when most everyone was still on dialup and servers were much slower in general. So number 3 doesn’t really apply anymore.  In this article, I’m going to give you a new reason number 3, and go into more detail on number 1, to help you understand why this is the correct way to do things.

(more…)