azurelunatic: DW: my eloquence cannot be captured in 140 chars (twitter)
Azure Jane Lunatic (Azz) 🌺 ([personal profile] azurelunatic) wrote2011-01-12 02:03 pm

Doing It Right: hacking out a spec for a native DW twitter importer

This is what I have so far; of course it's drafty (sketchy? something?) as all hell, but I figured I should have it out of my brain and onto bits that others can see and chatter about. Going to run through other places too.

Twitter-importer for DW, a spec [draft].

Must authenticate to Twitter properly.

Must collect all tweets, whether from a locked or unlocked account. (Twitter account being locked should not entirely prevent from public posting, because some people are merely locked to keep out the spambots.) Should collect retweets too, if possible the whole thing [oldschool misses the trailing characters].

Should be able to set a security for all Twitter entries that is distinct from the minimum security; some people prefer to post their Twitter imports to a custom security group even if their minimum is public. (Locked twitter account and locked entries would maintain security for people who need it.)

Must be able to have timestamps or no timestamps. Must be able to have linkback to original tweet, or no link back. (would be nice to have optional links for in-reply-to, location, and other doodads, but those seem like they'd be harder to get.)

Should be able to set tags (suggest "twitter"). (This will help with paid users' filtering.)

Should be able to set userpic (if they have a userpic called "twitter", maybe have it pre-selected for them?)

Mine counts the day's tweets. I like that. Lets you know what you're in for.

Should be able to customize the subject, with variables to insert the date if you want to, in your exact preferred configuration.

Should be able to cut tweets, either all of them entirely, or after some small reasonable number. My service does 5; LJ's does 10. Either of these are reasonable.

Should be able to customize the cut text.

Should be able to customize the posting time in a way that does not kill the server. Exact time seems like something that too many people would try for the same time, and things would bog down. Though a reasonable effort should be made to a) have the things post at the same time for the same person, and b) get the tweets of a particular span, most popular likely to be the whole day's twittery.

Should be robust enough to try again if Twitter failwhales for any reason while attempting to fetch.

Should be able to choose whether to import all tweets, or only tweets that lack an in-reply-to. (without in-reply-to is good for not having half-a-conversation, but bad if what you wanted to do was archive it all for your own use).

Might want to import from more than one Twitter account? (paid feature)

Should expand as many as possible shortened URLs, to prevent the failure of a shortening site from making your old links useless.


[separate-but-related, having Twitter taking over the Latest page would not do at all; if it ever threatens to dominate, it should be split out or something.]
senmut: an owl that is quite large sitting on a roof (Default)

[personal profile] senmut 2011-01-12 10:56 pm (UTC)(link)
You are a beautiful person for applying brain power to this.

I like my import, but I also do understand those who do not wish to see my twittered day.
wibbble: A manipulated picture of my eye, with a blue swirling background. (Default)

[personal profile] wibbble 2011-01-13 12:06 am (UTC)(link)
Getting the 'in reply to' and location links isn't difficult at all - the required data is in all the stuff returned by the API calls.

I actually recommend giving their API docs (http://dev.twitter.com/doc) a quick look over to see what you can/can't do - it's fairly readable and the API itself is quite nice to work with. If there's anything that's not clear, gimme a shout - I'll be around on IM tomorrow. (I might still be showing as available - that'll be the iMac upstairs. :o) )
geekgirl: (Computer Geek)

[personal profile] geekgirl 2011-01-13 12:14 am (UTC)(link)
My brain aches just reading this. Need to get my head out of the Drupal books soon and back into other things.
automaticdoor: Carefully recreated screenshot of Britta from Community ep 3x08 captioned "Britta Perry, Anarchist Cat Owner" (hillz twitter)

[personal profile] automaticdoor 2011-01-13 12:46 am (UTC)(link)
This would be my dream, omg.
jeshyr: Blessed are the broken. Harry Potter. (Default)

[personal profile] jeshyr 2011-01-13 02:50 am (UTC)(link)
Looks utterly brilliant!

One thing you missed: option to turn #hashtags @names in tweets into appropriate links to twitter hashtag searches/username pages.
jeshyr: Blessed are the broken. Harry Potter. (Default)

[personal profile] jeshyr 2011-01-13 02:51 am (UTC)(link)
If I could automatically set my tweets to post privately I'd probably start importing them for archive purposes!

Can I get authenticated RSS feeds of everything-except-tag-x from Dreamwidth though?
exor674: Computer Science is my girlfriend (Default)

[personal profile] exor674 2011-01-13 02:55 am (UTC)(link)
I'm assuming that this is going to be scheduled, not "ticky clicky run once" like the LJ importer?

( Maybe I should see if I can bribe whoever does the scheduled posts implementation to make that more general for other scheduled tasks )

[personal profile] andy 2011-01-13 07:07 am (UTC)(link)
It is scheduled on LJ as well; it posts on noon in the user's timezone, without allowing to tweak it.
wibbble: A manipulated picture of my eye, with a blue swirling background. (Default)

[personal profile] wibbble 2011-01-13 10:13 am (UTC)(link)
Doing a one-off Twitter 'import' isn't viable - they don't make really old stuff available. :o(
matgb: Artwork of 19th century upper class anarchist, text: MatGB (Default)

[personal profile] matgb 2011-01-13 09:34 am (UTC)(link)
Should expand as many as possible shortened URLs, to prevent the failure of a shortening site from making your old links useless.

Yes, this. I know this is an issue, but my brain doesn't always process it--I currently use Delicious, but if that goes the best workaround I can create is Twitter, unless Memories are vialbe with the new work being done on them.

I'd like to be able to not import tweets that start with the @ symbol, if I want atweet read by many I reword it to put the name further down.
wibbble: A manipulated picture of my eye, with a blue swirling background. (Default)

[personal profile] wibbble 2011-01-13 10:13 am (UTC)(link)
Most Twitter clients, including their website, set the 'in_reply_to_status_id' option for replies, which is how you're 'supposed' to decide if a tweet is a reply or not.

I don't think it records that option if the tweet doesn't start with '@', but I'd need to check.
matgb: Artwork of 19th century upper class anarchist, text: MatGB (Default)

[personal profile] matgb 2011-01-13 10:15 am (UTC)(link)
I'd rather it didn't do that though. I get why it does, but I've found that a bit buggy, and if I reword a reply to show generally it should.

Plus I sometimes just type @azurelunatic or whoever to ask something, and wouldn't do that as an actual reply.
jeshyr: Blessed are the broken. Harry Potter. (Default)

[personal profile] jeshyr 2011-01-13 10:33 am (UTC)(link)
Ditto to that. my client definitely didn't always set that correctly last time I looked.
jeshyr: Blessed are the broken. Harry Potter. (Default)

[personal profile] jeshyr 2011-01-13 10:35 am (UTC)(link)
Note about expanding shortened links: Depending on how you do that (checking 301/302 redirects vs using APIs) it would be cool to be able to expand URLs that point to bookmarking services like delicious, pinboard, diigo, etc. These, as we were all recently reminded, may be as transitory as the dammed tinsy URL services :(
wibbble: A manipulated picture of my eye, with a blue swirling background. (Default)

[personal profile] wibbble 2011-01-13 03:57 pm (UTC)(link)
Speaking of which, that reminded me to add expansion for t.co links, which should be live from the next run.
wibbble: A manipulated picture of my eye, with a blue swirling background. (Default)

[personal profile] wibbble 2011-01-13 03:53 pm (UTC)(link)
This is the code (in Ruby) I use to expand the URLs:

def expand(url)
  request = Curl::Easy.perform(url) do |curl|
    curl.follow_location = true
  end
  request.last_effective_url
end


This is using the 'curb' gem, which is basically a wrapper around libcurl. That should follow any chain of redirects until it gets to the final source - so as long as the bookmarking/shortening/whatever service just feeds a 301/302 redirect to the browser, that'll get the original source.

I think this is the best behaviour - in a case where you might not want to go all the way back to the original source (say, because there's comments on a social sharing-type site), you'd normally not be posting a link that'll redirect.

One improvement I should make is in URL-detection. At the moment it matches against specific known URL shorteners, but it wouldn't be that difficult to just feed it a generic 'this is a HTTP/HTTPS URL' matcher instead and feed all URLs in tweets through the expander. That might have unintended side-effects on sites that do redirection as part of their normal processes (like every site using the LJ codebase).
ephemera: celtic knotwork style sitting fox (Default)

[personal profile] ephemera 2011-01-13 09:58 pm (UTC)(link)
mmmm - that sounds good :D - lots of *choices* about how I want my tweets to show up in my journal, which LJ's tool seems so determined not let us have.

Random additional option that's crossed my mind, but would be complex if not impossible, to have the day's tweets-without-@relies, cut-taged after the first x, followed by the day's tweets-with-@replies, always behind a cut tag no matter the number, (or, I guess, the option to have the 'no conversations' list in one post and the 'complete with conversations' list in another post, with different default filtering)