azurelunatic: Vivid pink Alaskan wild rose. (Default)
Azure Jane Lunatic (Azz) 🌺 ([personal profile] azurelunatic) wrote2012-12-13 12:36 am

This is a screed on the basic nature of modern blog architecture, at possibly 101 level.

Why does one entry appear on many different pages? If I want to link to an entry, what page should I link to?


Most modern blog sites and formats allow a blogger to write an entry once, post it, and then have it automatically show up in several different places. Depending on how fucking stupid the blog engine is *cough*Tumblr*cough*, it may be difficult to figure out which copy is the "master" copy, and how to link to it so that people from the future can find it too.

Let's use Dreamwidth as an example. This entry, being public, will show up in a bunch of places:

In the archives of whatever site (such as search engines like google.com or archival sites like archive.org) chooses to keep a copy
In the archives of Dreamwidth's internal search engine
In the syndicated feeds on this site of my journal, and in the remote sites to which my journal is syndicated, such as on LJ: [livejournal.com profile] azz_on_dw
The Latest Things page
The reading pages of everyone who has subscribed to me (until it has been pushed off of there by newer entries, or until two weeks have passed)
The archived by-day reading pages of those who have subscribed to me who also have paid accounts (these are hard to find; I think there's a ticket submitted to make that easier)
The front page of my journal (until it has been pushed off of there by newer entries)
By title, in the monthly archive of my journal
In full with stuff hidden behind the cut, in the single day archive for this day
In full, on the entry view itself with all the comments, if there are any.
In full, scrolled down to the point of the first cut.
In full, with only some of the comments or highlighting one comment, on a link to a particular comment thread within the entry's discussion.

Archived copies or syndicated feeds as displayed on remote sites are a special case, because control of those copies are out of Dreamwidth's, and therefore my, direct control. If I make a change to an entry, for example to edit a typographical error, it may take minutes, hours, days, weeks, months, or never for it to change in the offsite copy. (Dreamwidth's search engine is subject to a delay. The exact timing of the delay depends on many things, including what the search engine ate for breakfast, and how hard Robby has been hitting it with a dataspanner.)

When I correct that typo, it changes all over Dreamwidth, basically immediately. (Except in the search engine's copy. That's separate.) This is because it is stored in basically one place. All of the different places it appears on Dreamwidth -- the reading pages, the calendar archive pages, and on its own page -- it shows up because the server fetches the master copy. (This is not the discussion where we explain Memcached and its friends, which keep the servers from falling over in abject agony when something really popular gets posted.)

Even on the most ephemeral parts of the internet, it's polite to assume that not everyone is going to be reading what you post at the exact moment you post it. It could be minutes, hours, or days before someone takes a look. (Maybe they saved a tweet to look at later, even if their twitterstream is rushing past so fast that you thought surely your tweet would be buried in moments.) Sometimes it pops up months or years later at complete random.

With that in mind, try to link to the most direct, most relevant page. Usually this is the plain entry view. On Dreamwidth, that link is built like:

http://username.dreamwidth.org/random series of numbers.html

Some blog sites have a short bit of descriptive text, called a "slug", instead of a random series of numbers, which makes things a little more friendly and human-readable, but Dreamwidth doesn't have that yet. (I think there's a ticket filed for that.)

The random series of numbers isn't truly random: it's assigned by the server at the time of posting, and involves how many entries have already been made in that journal multiplied by some randomness. This was built in the LiveJournal days to annoy script kiddies who were trying to download the entirety of LiveJournal or do brute-force poking at people's locked entries, and Dreamwidth kept that around.

An entry will keep that same permanent link even if the entry is edited to change the day when it was posted. If the journal is renamed, the entry number will remain the same, but the username will change. (Many times the journal owner will choose to redirect the old username to the new one, and in this case the old link will still work.)

A lot of times you may see http://username.dreamwidth.org/random series of numbers.html#stuff.

In old-school HTML, the # (pound, hash, or octothorpe) means an "anchor" on the page, like a little in-page bookmark. Dreamwidth has anchors defined for each cut tag, the top of the comments section, and each individual comment. If you don't want to link people to a specific cut tag, to the comments section as a whole, or a single comment/comment thread, you can safely get rid of all of the stuff to the right of the # when linking.

Modern web servers use the ? (question mark) to include "arguments", special information on how to display the page. Dreamwidth has a few things, like ?format=light, ?style=mine, and ?nohtml=1 that display the page in special ways. ("Light" view reduces the amount of special styling on the page no matter who's using it. "My style" shows the page in the way that the viewer's own journal displays in, if the viewer has a journal. "No html" shows the entry as it was entered, which can help diagnose code problems.) If you're linking something somewhere with the intention of printing it or viewing it on a touchy browser, leaving ?format=light might be useful, but you can often strip out the ? and everything to the right of it when linking.

If you're on your reading page view or an archive page view, the title of the entry usually links to the full entry, even if there isn't a link that's labeled as the permanent link.

Post a comment in response:

This account has disabled anonymous posting.
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org