The Basics Of .htaccess

Understanding a Server’s .htaccess File is a Key Component of Web Design and Development

Most amateur web hosting clients and new web developers assume that the most important part of any server setup is the installed software. They feverishly install their ASP-based or PHP-based application, in hopes of creating advanced content that is controlled by popular software and extensive database cells and tables. But amid this flurry of activity, almost every developer initially ignores one of the most important and powerful files on the server itself. That file is known as the ".htaccess" file, and it controls everything form error messages to password-protected pages, permalink structure to blocked users who cannot see the site’s content.

The ".htaccess" file resides in the server’s root public directory by default. On Linux servers, that directory is almost always called "public_html." For subfolders which need to benefit from the power of this file, a secondary ".htaccess" file can be placed within every subfolder and other directory on the server itself. Every file or folder within a directory with an ".htaccess" file inherits that file’s permissions and guidelines unless overridden by a separately-coded version of the file. If it sounds complex, that’s because of is a moderately advanced technique of controlling server behavior. It is not, however, impossible to learn. And once a developer or novice web hosting customers learns how to control their server using this file, they’ll find it impossible to return to the days when an ".htaccess" file was a nuisance that was hardly understood, let alone employed.

Password Protecting Directories is a Snap with the .htaccess File

Web hosting server security is something that almost every customer looks into before they commit to any specific hosting company or server technology. For the most part, Windows and Linux servers are equally secure, keeping information away form those who should not have it. But that is only true as far as defending against hackers and malicious web scripts. When it comes to password protecting specific files or directories, all of the work must be done by the .htaccess file separate from either server’s technologies or security features.

The .htaccess file has its own unique style of protecting these directories, as laid out by its simple, line-by-line programming language and techniques. In this case, a user simply defines the directory to be protected and then sets the required parameters for accessing that directory in the file itself. That password can apply to just one file, to an entire directory, or to the entire server itself when accessed via a typical web browser. The simple, single-line setup of a file or directory password looks like the example below when it as been completely filled out and initiated.

AuthUserFile /public_html/secure/files/.htpasswd
AuthGroupFile /dev/null
AuthName EnterPassword
AuthType Basic
require user secureUser

These lines of code are paired with a new file called ".htpasswd." This file resides in the directory which is being secured and it contains the username and password which will grant users access to the protected information. In the example above, it has been specified that only the user "secureUser" can be given access to the directory, and they must enter the required password as determined by the "AuthName EnterPassword" line of code that lies just a bit above the user definition.

In the ".htpasswd" file that will be created in the protected directory, users only need to add a single line of code. The format is simple; every user and password combination is separated using colons. This means they’ll simply add "user:password" to the file. In this case, it would be "secureUser:securedirectorypassword1" as an example. This will define all possible access codes and usernames. Site administrators can certainly include multiple users and passwords within this list, so long as the remove the "require user secureUser" distinction form the ".htpasswd" file. Alternatively, they could define multiple passwords for the same username while keeping that line of code, ensuring that access could be revoked from certain individuals or groups at any time if the right password is deleted from the ".htpasswd" file.

To learn more on this issue, read: Using .htpasswd with Your Linux Shared Hosting Account.

Defining Easy-to-Remember Links to Website Content and Static Pages

One of the most widespread current uses of the prolific ".htaccess" file is to use the file’s parameters to define semantically-easy "permalinks" for site content and static pages when using content management software like WordPress or MovableType. This not only helps users remember and reload site content, but also improves a website’s search ranking on major search engines. Those search engines use semantically-friendly URLs to determine what content is on any given page and whether or not the content matches the URL. A title-URL match indicates more authority and a higher likelihood that a user will find the information they’re looking for. Appropriate, the website then ranks higher on major search engines when using permalink structure.

For those users who have installed WordPress to their site’s servers, the following line is added during the installation process almost as a requirement — especially with more recent versions of the software from version 3.0 and newer. The line of code looks exactly like the example below and is standard for more than 60 million self-hosted WordPress customers worldwide.

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

Again, it’s easy to see the line-by-line structure of ".htaccess" file programming instructions. These few simple lines of code tell the server to use the WordPress "index.php" file as the base for all URLs which originate within the content manage software solution itself. Using that file as a base, and rewriting the entry titles as permalinks, the URL rewrite essentially combines database queries with file permissions to create "friendly" URLs. Modifying an ".htaccess" file to rewrite its URLs is currently the leading, if not only, way of constructing friendlier URLs from PHP and ASP-based content management software solutions.

Guiding Search Engines to Recently Relocated Content Locally or Remotely

While permalinks were designed to eliminate the process of randomly-relocated content and intensive redirects, they haven’t completely eliminated the problem. Indeed, permalinks can be managed within any content management solution’s control panel; the moment a user changes the permalink structure, everything within the purview of the CMS software can go missing and throw errors to search engines and users alike. This can be eliminated as problem with a very simple line of ".htaccess" code which redirects customers and throws an "invisible" error to major search engines. The code is this:

Redirect 301 /archive/ /past-entries/

This redirect places the old and new URLs side-by-side with the old URL listed first. It instructs the server to throw a 301 redirect error at browsers and search engine "spiders" alike, but this error is never seen by the end user. Instead, a 301 redirection is an invisible error that occurs behind the scenes. While the user is automatically taken to the new path and new content, the browser and search engine understand that "301" means "permanently moved." Both the browser and any search engines which experience this error will immediately update their records to reflect the new URL and they’ll remove the old one immediately.

This is a great way to move to a new domain name without losing customers, as well. The ".htaccess" file on the old server can be used to actually point to content at a completely new domain name with just a little change to the line of code that was presented above. Instead of pointing to two relative paths on the same server, the 301 redirection code in this case would look like the following:

Redirect 301 / http://www.your-domain-name.com

This instructs the server to send anything in the root directory, or any subfolders, to the new domain name at the exact same path as it was on the old server. All that changes is the actual domain name itself. Again, because it uses the 301 error code, search engines will automatically update their records to reflect the new domain name. This is not only a great tool for usability, but it also prevents a website from losing its search engine rankings. Instead of starting over, search engines will understand that this is the same website at a new location. They’ll maintain their old perception and rankings, and the administrator will benefit greatly from employing this unique method of moving between domains.

From Invisible Errors and Redirections to Actual Error Pages for Site Visitors

It’s certainly true that the ".htaccess" file specializes in redirecting users to new content while invisibly updating search engines, but this unique file can also be used to display specific error pages when content is missing, cannot be displayed, or is coded in such a way that errors prevent the page from loading at all. This is done by specifying a specific page for display based on the server’s three-digit error code. These codes are integrated into services like Apache and IIS, so there’s no real need for configuration beyond the ".htaccess" file. The example below tackles the world’s most common error page, known as the "404 error" for missing pages and directories.

ErrorDocument 404 /404.html

Whenever a user stumbles across a link that no longer exists, or a subfolder which has been deleted, they’ll automatically be redirected by the ".htaccess" file to an informative and helpful 404 error page that can be customized and specifically designed by the website’s administrator. This is a great way of bringing them back into the fold rather than sending them away with a simple and nondescript error page that offers no alternatives to the lost page.

The same process can be completed for virtually every type of server-based page-load error in existence. This includes 401, 403, and 500 server errors in addition to the typical "page not found" mishap that most users encounter. Be sure to research the meaning of each error code and present a custom-written message and site design for each error that will bring users back to existing site content with ease. It’s the best way to ensure that even a site’s navigational or logical failures are turned into opportunities and successes.

When All Else Fails, Ban Users with a Few Simple Lines of Code

For the most part, the ".htaccess" file is used for things which tend to work in the average user’s favor. This includes the error pages, custom redirection methods, and "friendly" permalink URLs, among other great features. But this file can also be used to make sure that some users are simply denied access to the website entirely. It’s a great way to ban those who "spam" site comments, frequently cause disruption or arguments among fellow readers, or simply can’t handle their access privileges in a responsible way.

To maintain a site’s integrity and make sure that other readers don’t migrate to other websites out of sheer frustration with just a few bad apples, the ".htaccess" file allows for banning specific IP addresses or entire ranges (or "blocks") of IP addresses. This means that entire countries can be banned, entire ISPs can be banned, or entire states, communities, or organizations can be forced to go elsewhere to read their daily content and cause trouble. When employed in the site’s ".htaccess" file, the process of banning a user looks like this:

allow from all
deny from 158.23.144.12
deny from 24.100

In the example above, the site is instructed to allow visitors from all IP addresses, except those who visit from the IP address 158.23.144.12. In addition, all visitors in the 24.100 range of IP addresses are denied access to a site. This range likely bans an entire internet service provider’s customer base from reading a site’s content. Rest assured, however, that this is sometimes necessary in dire cases.

Embrace the .htaccess File and Wield Some Power Over Site Functions

The great thing about the .htaccess file is that it allows server administrators to control settings that are otherwise considered very advanced or those which require "root" access to the server. This file is a great way to manage access, errors, redirections, and even URL structure and semantic sense. Mastering it is the key to protecting and empowering a site’s new and returning visitors.