<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Endikos &#187; SEO</title>
	<atom:link href="http://www.endikos.com/category/seo/feed" rel="self" type="application/rss+xml" />
	<link>http://www.endikos.com</link>
	<description>Journeying through the art and science of digital media.</description>
	<lastBuildDate>Sun, 11 Jul 2010 23:53:14 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Sitemaps and Multilingual Websites</title>
		<link>http://www.endikos.com/seo/sitemaps-and-multilingual-websites.html</link>
		<comments>http://www.endikos.com/seo/sitemaps-and-multilingual-websites.html#comments</comments>
		<pubDate>Wed, 07 Jan 2009 03:17:10 +0000</pubDate>
		<dc:creator>Endikos</dc:creator>
				<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.endikos.com/?p=41</guid>
		<description><![CDATA[[Note: This post has been moved to ThreeBit Media, my consulting website.]
When beginning the architecting of the site I&#8217;ve been working on I knew I would need to address two issues (among many others, but for now we&#8217;ll just cover these two): 1) How to structure a multilingual website physically, and 2) How to address [...]]]></description>
			<content:encoded><![CDATA[<p><em>[Note: This post has been moved to ThreeBit Media, my consulting website.]</em></p>
<p>When beginning the architecting of the site I&#8217;ve been working on I knew I would need to address two issues (among many others, but for now we&#8217;ll just cover these two): 1) How to structure a multilingual website physically, and 2) How to address the sitemap.xml structure for the site as a whole.</p>
<p>First I had to decide how the site should be physically structured.  Would a subdomain-per-language be good, e.g, en.mysite.com for English, es.mysite.com for Spanish, and ru.mysite.com for Russian?  Or would it be better to use directories for the distinction, e.g. www.mysite.com/en/ for English, etc?  If I chose the subdomain route it would be easy to build sitemap.xml files for each domain.  But how would I structure the sitemap.xml if using directories?</p>
<p>I chose to use directories for a couple of reasons.  One, I knew that google treats subdomains as entirely separate websites.  I didn&#8217;t wish to do this because semantically these were three translations of the same website, and I felt that should be reflected in their structure.  Two, I didn&#8217;t want to have multiple datasets when dealing with analytics, either multple log files to analyze or one of the myriad javascript-based analytics packages.  Yes, I&#8217;m fully aware that there are ways to glom datasets together, or otherwise make analytics packages aware of your structure&#8230; this was just pure personal preference.</p>
<p>OK, so now I have my structure in place, how do I build the sitemap.xml?  I don&#8217;t want one huge monolithic file for the entire site.  Even though at current count there are only around 100 html files per translation (not huge by any means, but also not insignificant), I would just personally prefer to keep the translations in their own separate sitemap.xml files.  Those of you familiar with sitemaps will have been shouting at your monitors by now &#8220;Use a sitemap index, dork!&#8221;, and you&#8217;d be right.  I just wasn&#8217;t sure that Google would support this.  Google didn&#8217;t seem to mention it anywhere in their webmaster tools documentation (though I could have just missed it).</p>
<p>I&#8217;m happy to report that Google does in fact support sitemap indexes, and I&#8217;m fairly certain that MSN and Yahoo! do as well. So, simply build yourself a sitemap_index.xml (the filename is arbitrary) file that looks like this:</p>
<pre id="line1" class="code"><span class="pi">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span>
&lt;<span class="start-tag">sitemapindex</span><span class="attribute-name">
  xmlns</span>=<span class="attribute-value">"http://www.sitemaps.org/schemas/sitemap/0.9"</span>&gt;
   &lt;<span class="start-tag">sitemap</span>&gt;
      &lt;<span class="start-tag">loc</span>&gt;http://www.mysite.com/sitemap_en.xml&lt;/<span class="end-tag">loc</span>&gt;
      &lt;<span class="start-tag">loc</span>&gt;http://www.mysite.com/sitemap_es.xml&lt;/<span class="end-tag">loc</span>&gt;
      &lt;<span class="start-tag">loc</span>&gt;http://www.mysite.com/sitemap_ru.xml&lt;/<span class="end-tag">loc</span>&gt;
   &lt;/<span class="end-tag">sitemap</span>&gt;
&lt;/<span class="end-tag">sitemapindex</span>&gt;</pre>
<p>Then build your individual sitemap files as you normally would.  You can find the full specifications for sitemaps at <a rel="external" href="http://sitemaps.org/">sitemaps.org</a>, and a nifty utility to help you automatically build sitemap files at the <a rel="external" href="http://sourceforge.net/projects/goog-sitemapgen/">google-sitemap_gen</a> project.  Dont forget to <a rel="external" href="http://www.sitemaps.org/protocol.php#informing">include your new sitemap index file in your robots.txt file</a>!  Enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endikos.com/seo/sitemaps-and-multilingual-websites.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linking to Internal Directories</title>
		<link>http://www.endikos.com/web-standards/linking-to-internal-directories.html</link>
		<comments>http://www.endikos.com/web-standards/linking-to-internal-directories.html#comments</comments>
		<pubDate>Sat, 15 Nov 2008 20:48:56 +0000</pubDate>
		<dc:creator>Endikos</dc:creator>
				<category><![CDATA[SEO]]></category>
		<category><![CDATA[Web Standards]]></category>

		<guid isPermaLink="false">http://www.endikos.com/?p=12</guid>
		<description><![CDATA[Using a trailing slash when linking to directories is more than just the correct thing to do, it's helpful to search engines and web servers too.]]></description>
			<content:encoded><![CDATA[<p><em>[Note: This post has been moved to ThreeBit Media, my consulting website.]</em></p>
<p>I could just say &#8220;Use trailing slashes!&#8221; and be done with it.  But that would leave you, dear reader, underwhelmed and grumbly.  You may have already read <a rel="external" href="http://www.alistapart.com/articles/slashforward/">this article</a> on A List Apart regarding using trailing slashes.  In that article the author mentions three reasons for using trailing slashes when linking to directories (and I quote):</p>
<ol>
<li>We’re doing ourselves a favor, as this is the correct way to do things.</li>
<li>﻿﻿We’re doing our server a favor, as this means less disk access.</li>
<li>And most importantly, we’re doing our visitors a favor, because they’re no longer losing a few seconds while our server tries to find first a file and then a directory. And in this industry, you and I both know that a few seconds is a long, long time.</li>
</ol>
<p>Now this article was written in 2002 when most everyone was still on dialup and servers were much slower in general. So number 3 doesn&#8217;t really apply anymore.  In this article, I&#8217;m going to give you a new reason number 3, and go into more detail on number 1, to help you understand why this is the correct way to do things.</p>
<p><span id="more-12"></span></p>
<p>Let&#8217;s first look at what happens when your browser requests a normal page.  We&#8217;ll mimic a simple browser session using telnet from the command line.</p>
<p>We first initiate the telnet session with the server:</p>
<pre class="code">Aletheia:~ wknechtel$ telnet www.sheldoncomics.com 80</pre>
<p>When we&#8217;re connected the server, it responds thus:</p>
<pre class="code">Trying 208.122.50.173...
Connected to sheldoncomics.com.
Escape character is '^]'.</pre>
<p>We then issue the commands that a browser would.  This is a little simplistic as a browser would also tell the server what sorts of encodings and content it can accept, but this will work:</p>
<pre class="code">GET / HTTP/1.1
Host: www.sheldoncomics.com</pre>
<p>The request we&#8217;ve just issued breaks down like this: GET is the request method. There are other request methods, you&#8217;re probably most familiar with GET and POST.  Then we specify the URI  we&#8217;re requesting. In this case we use a slash to indicate that we&#8217;re looking for the top-most root document the server will hand us. Then We specify the protocol of HTTP, version 1.1.</p>
<p>HTTP 1.1 introduced the concept of the virtual host, so that you could tie more than one domain to an IP address.  This brings us to the second line.  Since we&#8217;re using HTTP/1.1, we have to declare which host we&#8217;re looking for as well. Make sure you hit enter twice after this, so that the server knows you&#8217;re done entering the request. Now the server will begin to deliver your request:</p>
<pre class="code">HTTP/1.1 200 OK
Date: Sat, 15 Nov 2008 16:32:37 GMT
Server: Apache/2.2.10 (Unix)
Vary: Host
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked

1e4c

&lt;!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"&gt;
&lt;html&gt;
...
&lt;/html&gt;

0

Connection closed by foreign host.</pre>
<p>We&#8217;re only really interested in the first few lines, called the header, and I&#8217;ve snipped out the vast majority of the returned HTML.  Specifically we look at that &#8220;200 OK&#8221;.  The 200 Status code tells us that everything is good-to-go: the document we&#8217;ve requested exists, we have permission to view it, and nothing went wrong internally while trying to retrieve it.  It then tells us other things like the type of transmission to expect, how large the document is, and the version of the server software.</p>
<p>Now let&#8217;s look at what happens when we request a directory without a trailing slash:</p>
<pre class="code">GET /store HTTP/1.1
Host: www.sheldoncomics.com</pre>
<p>This time the response from the server is different:</p>
<pre class="code">HTTP/1.1 302 Found
Date: Sat, 15 Nov 2008 18:40:42 GMT
Server: Apache/2.2.10 (Unix)
Location: http://www.sheldoncomics.com/store/
Content-Length: 334
Content-Type: text/html; charset=iso-8859-1
...</pre>
<p>This is what&#8217;s known as a 302 redirect.  It was originally implemented so that webmasters could change the structure of their website and redirect visitors from the old (perhaps bookmarked) page, to the new page that has the same content.  This allows you to actually change the names of your HTML files or change out whatever dynamic back-end your using and not worry about visitors getting lost during changes.</p>
<p>Now with our URI request that didn&#8217;t include a trailing slash, Apache couldn&#8217;t find what we were looking for, because it though we were requesting a file, not a directory.  So trying to be nice before issuing a status of 404 (Not Found), Apache figured it would check &#8211; just in case &#8211; to see if there&#8217;s a directory matching the requested URI.  Since there was in this case, it issued a 302 automatically to let us know the location of what we were really looking for.  This is the reason it gave us a &#8220;Location: &#8221; header entry in its response to us.</p>
<p>So why use trailing slashes when linking to directories? Because the 302 is not supposed to compensate for an incorrectly structured link.  This, thinking back to reason number 1 above, is the correct way to do it.  Also, thinking back to number 2 above, it keeps apache from doing unnecessary work.  And now for your new reason number 3: its good for the search engines.  Proper structure &#8211; everything from proper and valid (X)HTML to correctly formed links &#8211; is more easily digestible by the search engine&#8217;s spiders.  If you want good placement, you should make it as easy as possible for the engines to crawl your site.  A trailing slash may seem insignificant, but it&#8217;s easy to do and makes web servers happy :-)  Enjoy!</p>
<p>Oh, by the way, you really should check out <a rel="external" href="http://www.sheldoncomics.com/">http://www.sheldoncomics.com/</a>, a masterfully written and illustrated comic about a boy genius, his talking duck, and their adventures in life; by Dave Kellett.  I&#8217;ve been following this strip for about six years now, and abusively used his server in my examples for this article. Tell Dave I said hello when you stop by.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endikos.com/web-standards/linking-to-internal-directories.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
