This time around I thought I'd get off the standards bandwagon for a bit and just talk about some of the real- world considerations that go into running a site. I had to do some work on our server configuration today, so I figured that'd make a decent topic. It kind of ballooned from there as I worked on it, but things tend to happen that way.
To start with, give careful thought to your directory/URL structure. You want something that will be logical and easy to maintain. It should make sense to both you and your users and truncating the URL should do something productive. If you're in /toplevel/category/subcat/page.html, then cutting back to any level should bring you to a useful page - not an error or a directory listing. People do explore sites by playing with the URL. For example, consider how you can trim this URL at each level and progress to logically 'higher level' pages:
Organize the site logically, and plan for growth. Don't just dump all the files into one 'flat' directory, create a hierarchy. Even if it seems silly at first, as your site grows you will thank yourself for your foresight. If you don't do it, you'll curse yourself endlessly. And give everything reasonable names - AddFunds, RemoveFunds, etc., and not Directory1, Directory2, etc. Even if you use a publishing tool that normally hides these things from you, someday you may need to do something by hand, or use a different tool. Or maybe your site grows and you need to hire more people to work on the site, and then they have to figure out what 'Directory1' means. These things also show in the URLs, so a customer who is confuse can glance at it and see 'AddFunds' and clue into the area of the site they're in, while 'Directory1' doesn't help them at all.
The next step is to set a custom 404 error handler. I actually ran into this again Wednesday. I took over the sites at work a while back, and I made a number of changes, but I never did this. Today I got passed a complaint because an external site had linked to a page that hasn't existed for a long time, if ever — at least since before I took over. So someone got a 404 by following the link, and we don't want that. So I changed our 404 handlers to redirect to the homepage of each site. Now when someone comes in via a bad link, or even typos a URL, etc, they'll at least go the home page and can navigate from there. And since most users following a link from a 3rd party site don't know exactly what to expect anyway, most of them will never even know they got an 'error', so their perception of our site is improved. This link will produce a 404.
At work we're running IIS, so I handled this by creating an PHP file that does the redirect (see below) and setting a Custom Error handler for 404 errors. First open the Properties for the server and select the Custom Errors tab. Then select the error you want to set a custom handler for, 404 in this case, and select Edit Properties.
Then select the new handler. In this case it is a URL I want to be called, and it is a file 'redirect-404.php' that I've installed at the root level of the site.
Yes, I know, most of you are probably using Apache. I usually do as well, but we don't always get to pick the platform we work on so while I prefer Linux/Apache, I'm picking up Windows/IIS as well. You can do the same kind of thing in Apache by setting your ErrorDocument, either in httpd.conf or a local .htaccess file. Since the conf file has an example I'll use that instead of reinventing the wheel:
# # Customizable error responses come in three flavors: # 1) plain text 2) local redirects 3) external redirects # # Some examples: #ErrorDocument 500 "The server made a boo boo." #ErrorDocument 404 /missing.html #ErrorDocument 404 "/cgi-bin/missing_handler.pl" #ErrorDocument 402 http://www.example.com/subscription_info.html #
What I've done in IIS is basically the same as a local redirect. So why didn't I just enter the URL of the page I wanted it to go to? Because when I did that IIS served the content of that page, but the URL in the browser didn't reflect the change, so relative links were FUBAR. Might there be a more elegant way to do this? Sure, but this took me 5 minutes and it worked. It took quite a bit longer to write this up than to do it, actually.
Starting from the above points, once you have a solid structure try to make as few changes as possible. Try not to move directories or pages, especially anything you think may be linked from outside or a page a user is likely to bookmark. If you do need to move things, try to put in redirection from the old page. For example, as part of the ongoing site expansion and re-branding at work, all of the content that used to live at the PayCash site now lives at the PayCash Wallet site. If you try to go to http://www.paycash.us/consumer/ you will find yourself at http://www.paycashwallet.com/consumer/. Since the entire site structure is the same all of the pages redirect to their new locations.
When I need to redirect there are a few options. If I am using an Apache server I would probably insert a rule for mod_alias or mod_rewrite to redirect surfers to the new content. At work, as I said, I'm on IIS, so I don't have the options I would on Apache. But there are still options: I can map a resource in IIS to a redirect. That is how I handled the entire site move - I setup two virtual directories for /consumer and /merchant and made them redirects. It is also a way to put in 'fast shortcuts' to pages deeper into a site, short URLs users can enter to jump to specific pages. For example, try this one: http://www.tivo.com/adapters.
This is how I have IIS configured for PayCash.us:
Note the virtual directories I created for consumer and merchant to redirect both as they now live on the new domain. For ease of site development and management all of the other domains live under the same document root. The 'mall' directory is the content for PayCashMall.com, etc. This way when I'm working on my development machine I can work with all the domains simply by using http://localhost/mall/, http://localhost/billpay/, etc. I also like how it organizes the structure. However, for branding purposes, they don't really want people using http://www.paycash.us/wallet/ (go ahead, try it), they want people to use http://www.paycashwallet.com/. So I setup a redirect to handle those too. The directories 'add', 'remove', 'download', and 'APS' are shortcut links to specific pages.
This is how it is configured to redirect a directory:
And this is how to redirect to a specific page:
It is pretty much the same - different URL, and for a specific page check the 'The exact URL entered above' box. These are basically equivalent to standard Apache mod_alias RedirectPermanent directive.
If you can't setup a redirect on the server or if you want to redirect a dynamic page (PHP, ASP, .Net, etc) then you can simply replace the page with one that returns a redirect programmatically. In PHP this is trivial:
<?php header("HTTP/1.1 301 Moved Permanently"); header("Location: http://www.paycash.us/"); exit(); ?>
That's it, dead simple. That's actually the redirect-404.php file for PayCash.us that I set as the 404 handler above.
That's really the order of preference for doing a redirect - set it in the server, use a dynamic page to return redirect headers, or finally embed it into a page delivered to the browser. Best to avoid having to do redirects due to moving content, but they're useful tools to understand. And, in the case of the shortcut URLs, they can really improve usability of a site.
Well, that's all I have to say this time. I was planning to write something else entirely for this entry, but then I had to deal with this stuff today and it just put the bug in my head to write it up as an entry, so I did. I guess I'll save the other stuff until next time. Until then...
Anyone reading these? :-)