This week's project is a web site. Unfortunately, while I got most of the development done on it over the weekend, I didn't have time to deploy it to a permanent hosting server tonight. It's running only on my development workstation at the moment. I can describe it here though and will update when it goes live.
The idea for the site was born out of a personal need. I'm planning on taking a vacation in March with my family and we were trying to figure out where to go.
Traveling with Tesla effectively rules out some options, like staying in hostels, so I was hoping to rent an apartment somewhere like I did when I lived in Brasil for a few months in 2005. There's a site Jay Allen introduced me to called VRBO.com. Vacation Rentals By Owner. It's got thousands of listings of apartments for rent all around the globe. Unfortunately the site looks like it's had the same UI design last century. It's hard to get a quick idea of how much a 2BR apartment costs in various cities.
Another issue in deciding where to go is how much it costs to fly there. I was checking the airfare for various cities as I thought of them but that was tedious.
What I really wanted was a site where I could put in my home airport and it would assemble the cost of a three week vacation in every city that had a VRBO listing plus the cost of airfare to that city. It would then display that list sorted cheapest to most expensive. Additionally it would be nice if the site allowed crowd-sourced compilation of various costs in the cities (typical meals, fare from the airport to city center, gallon of mil, etc) and included that in the assembled costs. This is the site I started to build. For a first iteration I limited it to only South American destinations, but eventually I will make it cover all of the VRBO listings.
The first step was scraping the apartment data from the VRBO site. For this I used the open-uri and hpricot ruby libraries to write a scraper. In production this scraper would run maybe once a day to get the latest listings and put them in the site database.
The second step was getting the airfares. I had found a great site, farecompare.com, that finds low price airfares. I started attempting to use the mechanize ruby library to manipulate the search form and scrape the results, but the site developers seemed to have gone to great lengths to make this difficult to do. Undaunted I fired up Wireshark and started sniffing the network traffic. If my browser can make the form work, I can write code to do it as long as I can analyze and reproduce all the necessary network traffic.
So I started getting into that, but then decided maybe I should just try another site in the interest of time. I looked at a few other sites but they all seemed to have a similar level of difficulty for scraping. Finally I noticed one site had almost exactly what I needed. Kayak.com is an airfare search site that has one search mode where you can put in your home airport, select a region of the world (like South America) and it finds the lowest fares to all the major cities in that region. Perfect! Except this search flow was also designed in a way that was difficult to scrape. Kayak had another feature that made them desirable though; they have a developer API, so network traffic analysis (not a lot of fun) wouldn't be necessary! Even though their API did not expose a way to do their regional search, I could make separate searches from a home airport to a list of other airports. It is not as elegant but it should work.
The Kayak API has some flaws though. First off, it seems buggy in that sometimes the API calls work, but most of the time I get a bogus error that says anonymous access to kayak API denied even though I am using a non-anonymous developer key. If I rerun the same API call several times in a row, eventually it works. This bumps me up against another problem with the Kayak API though: they limit API queries to 41 per hour (that's 1000/day). Since I have to make separate queries between the home airport and each city in a region, this means I can effectively only do one regional search per hour. They say you can request a higher limit, so I have an email request pending regarding that.
By the way, I think we're going to Buenos Aires.
System Overview
When a user comes to the site, she enters her home airport code, number of travelers and departure/return dates. The system fires off a BackgroundRB worker in a separate process to start hitting the Kayak API with airfare queries from the specified airport to each South American city listed in the VRBO database table. The user is shown a 'searching' web page that gets updated every 5 seconds as the airfare queries are coming in. Once the low price is found for each city, the background worker is released and the system correlates the airfares with the VRBO data already scraped and in the database. The Kayak results are cached for some time so that the queries don't need to be re-run again if the same home airport is entered. The lodging price is determined using the number of travelers to know what size apartments to look at and the length of stay to know whether to use the nightly, weekly or monthly rates.
Eventually I'd like to expand it to cover other housing options (hostels, hotels, chartered boats).
Anyone got any good names for such a site? It's so hard getting a decent domain name these days.