Web/Ideal
This page will be a small collection of the topics I care about when it comes to the World Wide Web. I will tell you what they are and why I care for them. The order of which they appear relevant to how important I find them. That is, if I had to pick some topics over the others, I would remove items from the bottom up.
Contents |
Content
This part should really go without saying, but if you have nothing to say, don't make a site about it!
Seriously.
There is already one too many “Hey, I'm John Doe, and this is my website” with some animation of a construction worker sign that says it was last updated in 1996. Find something to write about, then start making the web site. I don't mean to say I've got something against personal web sites. My web site is a personal web site. I only care that you put something on it. Anything at all.
As you will gather from the following sections of this document, I also care very much for correct usage of the technologies of the WWW, but to be honest—they do not mean anything if your site cannot teach me anything.
Valid markup, semantics and style sheets
Just like proper grammar and spelling helps inter-human communication, communication on the web requires a certain set of rules and standards to work properly. You can consider web sites equivalent to a conversation in the sense that even if you do not use proper grammar and spelling, you will (in most cases) be understood by others, but people are more likely to misunderstand you or not understand you at all. By using proper spelling and grammar, you will be more easily understood. The same goes for the WWW.
HyperText Markup Language should be used as it was intended: to mark up the meaning of the text within. For instance, the headings on a web page should be within header elements, lists of items—whether they be links, a grocery list or ingredients in a recipe—should be put in lists, text that you want to emphasize should be put within elements that indicate emphasis, and so on.
You must also keep in mind that HTML is not meant to be used for layout purposes. When you are done with your HTML, it should typically be black text on a white background, and should look like a very simple report or written article. It should have no layout and style applied apart from those applied by browsers by default.
If you want your site to have a more complicated layout and style, bring in Cascading Style Sheets. CSS is used to describe how each element and its content should be presented to users. This is not only a strictly graphical presentation tool, it can also describe how the content is to be read to those who use screen readers, and it can describe different styles for people using different devices to view your pages. One of my favorite uses of CSS is to control how a page is to be printed, thus avoiding the need for “printer friendly” pages.
Writing valid HTML and CSS isn't difficult, though. You just follow the basic rules of the languages, and use a validator to see where you make mistakes. More important that validation, however, is the semantics behind what you mark up. Validation is still important, as you can easily have a page that validates, but is not properly semantic; it is a lot harder to have an invalid page that is properly semantic. In order to be properly semantic, you need to have a valid page.
Semantics deal with the way you use elements (and their attributes) to properly describe content. A bad practice would be to have images all over your pages which are not only there to be decorative, but have a meaning behind them (i.e. they contain text within them) and their alternate texts are set to an empty string or not set at all. It will validate, but it will not be correctly semantic. In order to be semantic, all non-decorative images must have an alternative text that replaces the image in the cases where the image cannot be shown to the user, as this user might be using a text-only browser, or might have images turned off, or they may be using braille to browse your pages. Another example of bad practice is to create a website without headers, using CSS to turn ordinary paragraphs into headers only for the users of graphical browsers. Mark up your stuff properly, and only use CSS to style it.
Accessibility
This can be considered a subgroup of the aforementioned semantics, because if you create your pages properly according to the semantic rules, they will almost certainly be accessible for everyone. However, it is important to note a few things. The use of elements that require you to have the ability to do anything but being able to read text within HTML elements, can be deemed inaccessible. For instance, Flash movies, which to my knowledge require you to both see and to be able to use a mouse or some other kind of non-keyboard pointing device, is inaccessible. This can to a certain extent be remedied by use of alternative content, but the fewest of people provide such alternative content. Another accessibility trap many fall into is the use of JavaScript to accomplish something that makes the page unusable if JavaScript is missing. This will not only affect those with disabilities, those who cannot use graphical browsers, it also affects those using graphical browsers with their JavaScript turned off as a security measure. If you are going to use JavaScript, make sure to always have some alternate method to get things done. In other words, only use JavaScript to enhance things, not to create functionality that is otherwise not accessible.
Frames
If you are a seasoned web surfer, you have without a doubt seen these monsters before. Frames, although they may seem useful and fancy, cause more problems than they can ever solve. My largest concern concerning the use of frames has to do with usability and accessibility.
First of all, frames cause several problems for people visiting your site. For instance, if you happen to change the content of more than one frame at the same time, say, update both the content and the header frame, the user's browser's back button will not work as it should anymore. If the user is really unlucky, it will change the header back to its previous state, but leave the content the same. Not exactly usability at its best, and certainly annoying.
Second, have you ever tried to bookmark a site that uses frames? If you tried, you found that you were only able to bookmark the frameset, (The skeleton that contains the frames) and when you tried returning to your bookmark, you did not get back to the state you bookmarked, you got back to the frameset's default. The same problem arises if you try to link to a site that is using frames. To circumvent the issue of always ending up with the frameset's default, I always link to the page in question instead. This is good for the people following the link, (which are the ones I care about, naturally) but it is not very good for the owner of the site, as the user coming there from my site will not be able to get anywhere from the page I linked to, since the navigation, which is in another frame, doesn't show up.
Last, some browsers (all?) will reset to the frameset's default when you refresh the view. My browser (Firefox) has a function for only refreshing a certain frame, but it takes more effort that just pressing F5.
Frames have even more problems built into them, but these are the worst, and should be enough to keep you away from them.
The Unicode Character Set (UTF-8/UTF-16)
On a computer, everything—absolutely everything—is done with bits. Typically today, each byte has eight bits. This is known to most of you. Now, each such byte can store 256 different “states”. It can store a number from 0 to 255. (This can be calculated by doing 28.) In typical character sets, each character is represented by one of these 256 numbers. Therein lies the problem. Consider the number 256. Do you think all the different kinds of characters in all the different alphabets ever used can be into 256 “states”? Of course not. However, a solution was made to take care of this. It is called Unicode. In Unicode, one can store (theoretically that is—you have 21 bits available: 221=) 2,097,152 different characters. Therefore, everybody who is creating something textual today (like a web page!) should use Unicode—it obsoletes all the other character sets.
The preferred method for most people would be to use the UTF-8 encoding, which means that nearly all the normal Latin characters will be stored as they were in ASCII. This means that most English speaking users only have to change to UTF-8, and everything will still be OK. Some might need to fix up a few things if they've used non-ASCII characters in their text, but I think this is rare.
Most common symbols and derivations of Latin will take two bytes in UTF-8. For instance the ø in my last name will use two bytes. As you proceed further away from Latin, you'll use even more space. However, no more than four bytes per character will ever be used, which is very acceptable. I have not yet had the need for a four-byte character myself.
To illustrate the magic of Unicode, I will now write my name in a couple of different-from-Latin character sets (Notice that you may end up getting nothing but squares, and this will in that case be due to the font in use. Especially older fonts do not contain these characters, and Unicode may not have existed when they were made):
Japanese
- Name
- Alexander
- Romaji
- Areguzanda
- Katakana
- アレゲザゲー
- Hiragana
- あれぐざんだあ
- Kanji
- 保護 (hogo) ("Protector")
Greek
Αλεξαντρ: “Alexantr”. This is probably how my name was originally written, as my name is originally Greek.
Russian (Cyrillic)
Александр, “Alyeksandr”.
Runes
ᚫᛚᛖᚲᛊᚫᚾᛞᛖᚱ
Of course, there are only a very few times you really have a need to write both Latin, Japanese, Greek, Russian and runes on the very same page, but there are other uses of Unicode also. One of my favorite reasons are the extra special characters you do not get using typical character sets. These include the em-dash, —, the en-dash, –, the “pretty” quotation marks, “” and more. Many of these can be represented in any character set through the use of entities, but honestly, who wants to type &emdash;, when one can just use the character (—) directly?
Content Coding (Compression of content during transfer)
Back when computers were slow, I can understand that this technology was not in use. After all, it could very well take more time to compress the content before sending it, than it would take to just send it with the old computers. Sad that is, though, because it was needed more before than now, but one should not forget that there are still users out there who are on dial-up, and they will appreciate the approximately 70% on average less data they have to download. Even people with high-speed lines will enjoy this, a 70% reduction is quite a lot.
When you visit a web page, your browser makes a request for it. The server reads the file from disk (or perhaps it makes it on the fly, through PHP and other server-side technologies) and it is sent back to the browser that requested it exactly as it is stored (or created). This is a waste of bandwidth. It can be compared to downloading a computer program without it being compressed in some way or another. (Through ZIP, RAR, 7Z, or whatever technology you prefer) There is a remedy to this waste, however. HTTP specifies something called “Content-Encoding”. First, the browser specifies which encodings it supports. Then, the server goes through the list of encodings the browser supports, and checks if it can do any of them, and if it can, it picks the one it prefers, (or the only one it can do) and encodes the content. This process has to be enabled at the server though, which it normally is not, unfortunately. The encoded content is sent back to the browser, which reverses the process, and decodes the content back to its original format.
The gzip content coding will on average reduce a page's size by around 65%. Consider then, that you during a day visit pages worth about 10 MiB in size, including HTML, CSS and JavaScripts, but excluding pictures and other files. Those are normally already compressed. If all those had been gziped before being sent to you with an average reduction of 65%, you would only have to download 4.5 MiB, less than half of the original content. Theoretically, this more than doubles your line speed!
Get rid of that www!
A question to ask here is how it even got invented. Consider this: When you mail someone, do you do “person@mail.somedomain.com”? No, of course not. You do “person@somedomain.com” For the same reason, you should not have to do “http://www.somedomain.com/” when visiting a web page. When you type “http://” in front of a domain name, you are going to visit the domain's web page. Just like when you send a mail to “person@somedomain.com”, you are sending it to “person” who has got a mail address at “somedomain.com”.
I can accept and respect people who have the www. domain left there as people are used to it after all, but then only as an alternative to the www.-less address. I get so frustrated when I type in an address and get “No server configured at this address” or “Connection refused” or something similar, but going to the same address with the www. gives me the actual site. You can add www to my domain when connecting to it, but if you pay attention, you will notice that it disappears as soon as you get here. That is because I despise it, and want it gone. However, due to the fact that many people seem to think it is a necessity when entering their URLs, I allow it as an entrance.
