20090530

Wget this

I think I've mentioned before my netflix obsession: I want to be able to receive a disk from my selected set of movies without knowing what's on it till I start to watch it.

Well, this system works fine, but occasionally I do at least want to know how long this weeks movie is, and maybe even what genre it's in. This is trick, because there's no easy way to do this on the netflix site without learning the identity of the movie. So I decided to hack together a system to do this.

I could have tried to figure out how to use the netflix API to do this, but all that authentication stuff seems complicated. So instead, I decided to hack together something via feedflix: this is a third part site that shows you statistics about your usage. After logging in, it shows you a link to your current movie. So if I could just get a bot to simulate logging in and clicking on the movie, then I could scrape out the relevant details. Well, of course, it was much more complicated than I had thought. As well as dealing with cookies and session keys and the like, there was one particularly nasty bit: the site would return an authentication code which you had to post back with the log in form. Mostly this form is randomly chosen letters and numbers. But sometimes it would return a string with other characters like + in, which seemed to cause a problem when I would post them back. So I had to borrow someone else's trick, and run perl on the string to URL encode it. As a result, I think that this is quite possibly the nastiest little shell script that I have ever concocted. I'll paste it in, mainly to offend people who have any clue that it means:


#!/bin/bash
EMAIL={my email address}
PWD={my feedflix password}
WOPTS="--cookies=on --load-cookies cookies.txt --keep-session-cookies --save-cookies cookies.txt -olog"
BASEURL=http://feedflix.com
LOGIN=/login/
TMPFILE=movie

rm cookies.txt
wget -O$TMPFILE $WOPTS $BASEURL$LOGIN
AUTH1=$(grep authenticity $TMPFILE | cut -d\" -f12)
AUTH=$(perl -MURI::Escape -e "print uri_escape('$AUTH1');")
echo $AUTH
wget $BASEURL$LOGIN $WOPTS --post-data=authenticity_token=$AUTH\&email=$EMAIL\&password=$PWD\&commit=Login -O$TMPFILE
MOVIE=$(grep "href=/movie" $TMPFILE | head -n1 | cut '-d=' -f2 | cut '-d>' -f1 )
wget -O$TMPFILE $WOPTS $BASEURL$MOVIE
grep Duration movie -A 1 | tail -n1
grep Genre movie -A 1 | grep -v div
rm $TMPFILE


Vile, huh? And it'll probably break on the next movie that comes through, so I'll have to figure a way to patch it up then.

No comments: