Azure Jane Lunatic (Azz) 🌺 (
azurelunatic) wrote2012-12-13 12:36 am
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Entry tags:
This is a screed on the basic nature of modern blog architecture, at possibly 101 level.
Why does one entry appear on many different pages? If I want to link to an entry, what page should I link to?
Most modern blog sites and formats allow a blogger to write an entry once, post it, and then have it automatically show up in several different places. Depending on how fucking stupid the blog engine is *cough*Tumblr*cough*, it may be difficult to figure out which copy is the "master" copy, and how to link to it so that people from the future can find it too.
Let's use Dreamwidth as an example. This entry, being public, will show up in a bunch of places:
In the archives of whatever site (such as search engines like google.com or archival sites like archive.org) chooses to keep a copy
In the archives of Dreamwidth's internal search engine
In the syndicated feeds on this site of my journal, and in the remote sites to which my journal is syndicated, such as on LJ:
azz_on_dw
The Latest Things page
The reading pages of everyone who has subscribed to me (until it has been pushed off of there by newer entries, or until two weeks have passed)
The archived by-day reading pages of those who have subscribed to me who also have paid accounts (these are hard to find; I think there's a ticket submitted to make that easier)
The front page of my journal (until it has been pushed off of there by newer entries)
By title, in the monthly archive of my journal
In full with stuff hidden behind the cut, in the single day archive for this day
In full, on the entry view itself with all the comments, if there are any.
In full, scrolled down to the point of the first cut.
In full, with only some of the comments or highlighting one comment, on a link to a particular comment thread within the entry's discussion.
Archived copies or syndicated feeds as displayed on remote sites are a special case, because control of those copies are out of Dreamwidth's, and therefore my, direct control. If I make a change to an entry, for example to edit a typographical error, it may take minutes, hours, days, weeks, months, or never for it to change in the offsite copy. (Dreamwidth's search engine is subject to a delay. The exact timing of the delay depends on many things, including what the search engine ate for breakfast, and how hard Robby has been hitting it with a dataspanner.)
When I correct that typo, it changes all over Dreamwidth, basically immediately. (Except in the search engine's copy. That's separate.) This is because it is stored in basically one place. All of the different places it appears on Dreamwidth -- the reading pages, the calendar archive pages, and on its own page -- it shows up because the server fetches the master copy. (This is not the discussion where we explain Memcached and its friends, which keep the servers from falling over in abject agony when something really popular gets posted.)
Even on the most ephemeral parts of the internet, it's polite to assume that not everyone is going to be reading what you post at the exact moment you post it. It could be minutes, hours, or days before someone takes a look. (Maybe they saved a tweet to look at later, even if their twitterstream is rushing past so fast that you thought surely your tweet would be buried in moments.) Sometimes it pops up months or years later at complete random.
With that in mind, try to link to the most direct, most relevant page. Usually this is the plain entry view. On Dreamwidth, that link is built like:
http://username.dreamwidth.org/random series of numbers.html
Some blog sites have a short bit of descriptive text, called a "slug", instead of a random series of numbers, which makes things a little more friendly and human-readable, but Dreamwidth doesn't have that yet. (I think there's a ticket filed for that.)
The random series of numbers isn't truly random: it's assigned by the server at the time of posting, and involves how many entries have already been made in that journal multiplied by some randomness. This was built in the LiveJournal days to annoy script kiddies who were trying to download the entirety of LiveJournal or do brute-force poking at people's locked entries, and Dreamwidth kept that around.
An entry will keep that same permanent link even if the entry is edited to change the day when it was posted. If the journal is renamed, the entry number will remain the same, but the username will change. (Many times the journal owner will choose to redirect the old username to the new one, and in this case the old link will still work.)
A lot of times you may see http://username.dreamwidth.org/random series of numbers.html#stuff.
In old-school HTML, the # (pound, hash, or octothorpe) means an "anchor" on the page, like a little in-page bookmark. Dreamwidth has anchors defined for each cut tag, the top of the comments section, and each individual comment. If you don't want to link people to a specific cut tag, to the comments section as a whole, or a single comment/comment thread, you can safely get rid of all of the stuff to the right of the # when linking.
Modern web servers use the ? (question mark) to include "arguments", special information on how to display the page. Dreamwidth has a few things, like ?format=light, ?style=mine, and ?nohtml=1 that display the page in special ways. ("Light" view reduces the amount of special styling on the page no matter who's using it. "My style" shows the page in the way that the viewer's own journal displays in, if the viewer has a journal. "No html" shows the entry as it was entered, which can help diagnose code problems.) If you're linking something somewhere with the intention of printing it or viewing it on a touchy browser, leaving ?format=light might be useful, but you can often strip out the ? and everything to the right of it when linking.
If you're on your reading page view or an archive page view, the title of the entry usually links to the full entry, even if there isn't a link that's labeled as the permanent link.
Most modern blog sites and formats allow a blogger to write an entry once, post it, and then have it automatically show up in several different places. Depending on how fucking stupid the blog engine is *cough*Tumblr*cough*, it may be difficult to figure out which copy is the "master" copy, and how to link to it so that people from the future can find it too.
Let's use Dreamwidth as an example. This entry, being public, will show up in a bunch of places:
In the archives of whatever site (such as search engines like google.com or archival sites like archive.org) chooses to keep a copy
In the archives of Dreamwidth's internal search engine
In the syndicated feeds on this site of my journal, and in the remote sites to which my journal is syndicated, such as on LJ:
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-syndicated.gif)
The Latest Things page
The reading pages of everyone who has subscribed to me (until it has been pushed off of there by newer entries, or until two weeks have passed)
The archived by-day reading pages of those who have subscribed to me who also have paid accounts (these are hard to find; I think there's a ticket submitted to make that easier)
The front page of my journal (until it has been pushed off of there by newer entries)
By title, in the monthly archive of my journal
In full with stuff hidden behind the cut, in the single day archive for this day
In full, on the entry view itself with all the comments, if there are any.
In full, scrolled down to the point of the first cut.
In full, with only some of the comments or highlighting one comment, on a link to a particular comment thread within the entry's discussion.
Archived copies or syndicated feeds as displayed on remote sites are a special case, because control of those copies are out of Dreamwidth's, and therefore my, direct control. If I make a change to an entry, for example to edit a typographical error, it may take minutes, hours, days, weeks, months, or never for it to change in the offsite copy. (Dreamwidth's search engine is subject to a delay. The exact timing of the delay depends on many things, including what the search engine ate for breakfast, and how hard Robby has been hitting it with a dataspanner.)
When I correct that typo, it changes all over Dreamwidth, basically immediately. (Except in the search engine's copy. That's separate.) This is because it is stored in basically one place. All of the different places it appears on Dreamwidth -- the reading pages, the calendar archive pages, and on its own page -- it shows up because the server fetches the master copy. (This is not the discussion where we explain Memcached and its friends, which keep the servers from falling over in abject agony when something really popular gets posted.)
Even on the most ephemeral parts of the internet, it's polite to assume that not everyone is going to be reading what you post at the exact moment you post it. It could be minutes, hours, or days before someone takes a look. (Maybe they saved a tweet to look at later, even if their twitterstream is rushing past so fast that you thought surely your tweet would be buried in moments.) Sometimes it pops up months or years later at complete random.
With that in mind, try to link to the most direct, most relevant page. Usually this is the plain entry view. On Dreamwidth, that link is built like:
http://username.dreamwidth.org/random series of numbers.html
Some blog sites have a short bit of descriptive text, called a "slug", instead of a random series of numbers, which makes things a little more friendly and human-readable, but Dreamwidth doesn't have that yet. (I think there's a ticket filed for that.)
The random series of numbers isn't truly random: it's assigned by the server at the time of posting, and involves how many entries have already been made in that journal multiplied by some randomness. This was built in the LiveJournal days to annoy script kiddies who were trying to download the entirety of LiveJournal or do brute-force poking at people's locked entries, and Dreamwidth kept that around.
An entry will keep that same permanent link even if the entry is edited to change the day when it was posted. If the journal is renamed, the entry number will remain the same, but the username will change. (Many times the journal owner will choose to redirect the old username to the new one, and in this case the old link will still work.)
A lot of times you may see http://username.dreamwidth.org/random series of numbers.html#stuff.
In old-school HTML, the # (pound, hash, or octothorpe) means an "anchor" on the page, like a little in-page bookmark. Dreamwidth has anchors defined for each cut tag, the top of the comments section, and each individual comment. If you don't want to link people to a specific cut tag, to the comments section as a whole, or a single comment/comment thread, you can safely get rid of all of the stuff to the right of the # when linking.
Modern web servers use the ? (question mark) to include "arguments", special information on how to display the page. Dreamwidth has a few things, like ?format=light, ?style=mine, and ?nohtml=1 that display the page in special ways. ("Light" view reduces the amount of special styling on the page no matter who's using it. "My style" shows the page in the way that the viewer's own journal displays in, if the viewer has a journal. "No html" shows the entry as it was entered, which can help diagnose code problems.) If you're linking something somewhere with the intention of printing it or viewing it on a touchy browser, leaving ?format=light might be useful, but you can often strip out the ? and everything to the right of it when linking.
If you're on your reading page view or an archive page view, the title of the entry usually links to the full entry, even if there isn't a link that's labeled as the permanent link.
no subject
no subject
If you 'view source' on this post, you'll see '<link rel="canonical" href="http://azurelunatic.dreamwidth.org/6972955.html" />' in there somewhere. That's Dreamwidth's way of telling places like Google that for this given set of text, http://azurelunatic.dreamwidth.org/6972955.html is the Official 'Master' Copy. Google will push this page up a little in the ranking over the other places this text might appear (like reading pages, etc).
Using a <link rel="canonical"> is the proper way to tell the internet which page is the 'master' copy, but Google don't trust it entirely since it's easy to fake or just apply incorrectly.
Those details might be a bit more 102 rather than 101.
no subject
Or sometimes in addition to the random series of numbers, and/or in addition to the date.
The random series of numbers isn't truly random: it's assigned by the server at the time of posting, and involves how many entries have already been made in that journal multiplied by some randomness.
Not quite; it's a number based on how many entries have been made added to some randomness.
To be precise, it's a base number multiplied by 256 plus a random number between 0 and 255 [or possibly between 1 and 256, I forget exactly which, but the former is more likely].
The base number (internally called, if I remember correctly, the jitemid = journal item ID: 27238 for this entry) goes up by one for each post); the random number is the "anum" ("a number"? 27 in this case); and jitemid*256 + anum = ditemid (the display item ID: 6972955 in this case, which is 27238*256 + 27).
So depending on where the jitemid starts counting (0? 1?), this is roughly the 27238th entry posted in this journal. And given that your profile lists 27220 entries, you've probably deleted about 18 entries over the lifetime of the journal.
no subject
no subject