Combining HTTP and JavaScript APIs with python on google appengine
In this part I will introduce the python implementation of the ip to geolocation script. It's more object oriented and hopefully better to read. In the first part of this article I willdescribe the solution to read http resources and parse the content. The second part is the same like the php version. As conclusion I will compare the results of all five APIs with the data from the cache.
- the tutorial how to use ip to geolocation provider api
- the free usage ip to geolocation aggregator script
- the php stand alone implementation
- the python google appengine implementation
python: asynchonous http requests on google appengine
First I want to build the same multi-url-fetch-function (like in php) but I am using the google appengine. There are no threads allowed and the urllib/httplib modules are masked with the urlfetch module from google. I chose the normal and easy urllib.open call because the google backend works fast. After this was done I found in the updated URLFetch-documentation (since June 18, 2009 or appengine version 1.2.3) the section that said: "To do asynchronous calls you have to use the special modul from the urlfetch modul". Have fun with the improved example.
- from google.appengine.api import urlfetch
- class InfoItem(dict):
- '''dict with start reading while __init__ the ipinfodb '''
- def __init__(self, url):
- self.rpc = urlfetch.create_rpc()
- urlfetch.make_fetch_call(self.rpc, url)
- def ready(self):
- '''Check if the async call is ready.
- @return True - if got data after parsing
- '''
- try:
- result = self.rpc.get_result()
- except urlfetch.Error, ex:
- logging.error("Error while fetch: %s" % ex)
- return False
- if result.status_code != 200:
- return False
- return self.parse(result.content)
- #ready
- #InfoItem
For easy access the result class is based on a python dict. To check if the api data is filled in the dict call the ready() function. You can build the instances of InfoItem, do something other and then ask the instances with the ready-function, if the data has arrived (if not it will wait). Accessing the values is easy because it's a dict.
Parsing XML data with python
xml should be parsed with the elementtree modul. Its very fast and simple to use. Using the InfoItem class there are two jobs: building the url to the api by simple adding the ip string and parsing the content.
- import xml.etree.ElementTree as etree
- from xml.parsers.expat import ExpatError
- class IpInfoDbItem(InfoItem):
- '''Simple parsing the content of the IpInfoDP-API'''
- def __init__(self, ip):
- '''Init with the IpInfoDb-url'''
- super(IpInfoDbItem, self).__init__("http://ipinfodb.com/ip_query.php?ip="+ip)
- def parse(self, content):
- '''Parse the IpInfoDb-XML and save the keys in the inner dict.
- @return True - if parsing was successfull.
- '''
- try:
- #etree needs a file-like-object instead a string!
- t = etree.ElementTree().parse(StringIO.StringIO(content))
- self.update({'name': 'ipinfodb',
- 'country': t.find("CountryName").text or '',
- 'city': t.find("City").text or '',
- 'lat': float(t.find("Latitude").text),
- 'long': float(t.find("Longitude").text)})
- return True
- except (ExpatError, IOError), ex:
- logging.warn("Nothing parsed: %s" % ex)
- return False
- #parse
- #IpInfoDbItem
- #Test the code directly (if google modules are in the path)
- testing = IpInfoDbItem("127.0.0.1")
- if testing.ready():
- print testing
- # {'lat': 0.0, 'country': 'Reserved', 'name': 'ipinfodb', 'long': 0.0, 'city': None}
The example starts fetching the data from the IpInfoDb-API in the __init__ function, parses the xml und fills the values in the dict with self.update.
Parsing non-structured data with python
The same hint like in php - use regular expressing for matching the data!
- import re
- class HostIpItem(InfoItem):
- '''dict with reading while __init__ the hostip '''
- def __init__(self, ip):
- super(HostIpItem, self).__init__("http://api.hostip.info/get_html.php?position=true&ip="+ip)
- def parse(self, content):
- '''Parse the HostIp-Text and save the keys in the inner dict.
- @return True if parsing was successfull.
- '''
- match = re.search("Country:\s+(.*?)\(\w+\)\nCity:\s+(.*?)\nLatitude: (-*\d+\.\d+)\nLongitude: (-*\d+\.\d+)", content, re.S|re.I)
- if match:
- self.update( {'name': 'hostip',
- 'country': match.group(1),
- 'city': match.group(2),
- 'long': float(match.group(4)),
- 'lat': float(match.group(3))})
- return True
- return False
- #parse
- #HostIpItem
Works like the xml example ...
Build a complete webapplication
To put this together you have to define a RequestHandler, who fetches the data and produces a javascript. In django style you need the following template, the values in {{ x }} will be replaced with a dict.
- var com = com||{};
- com.unitedCoders = com.unitedCoders||{};
- com.unitedCoders.geo = com.unitedCoders.geo||{};
- com.unitedCoders.geo.ll = {{ ll_json }} ;
- {{ maxmind }}
- {{ wipmania }}
- {{ google }}
- document.write('<script type="text/javascript" src="http://pyUnitedCoders.appspot.com/geo_func.js"></script>');
- com.unitedCoders.geo.staticMapUrl = function(x, y) {
- var url = "http://maps.google.com/staticmap?key={{ google_key }}&size="+x+"x"+y+"&markers=";
- var colors = ["blue","green","red","yellow","white", "black"];
- for (var i=0; i<com.unitedCoders.geo.ll.length;i++) {
- var s = com.unitedCoders.geo.ll[i];
- url += s.lat+","+s.long+",mid"+colors[i]+(i+1)+"%7C";
- };
- url += this.getLat() + ","+this.getLong() + ",black";
- return url;
- };
- from google.appengine.ext import webapp
- from google.appengine.ext.webapp.util import run_wsgi_app
- from google.appengine.ext.webapp import template
- class GeoScript(webapp.RequestHandler):
- def get(self):
- '''Get the location infos for the calling ip (from api).'''
- self.response.headers['Content-Type'] = 'text/plain;charset=UTF-8'
- #result-dict and local location list
- result = {}
- ll = []
- #Start fetching API data
- ipInfo = IpInfoDbItem(ip)
- hostIp = HostIpItem(ip)
- #Add some more Javascrip APIs
- scriptTemp = "document.write('<script type=\"text/javascript\" src=\"%s\"></script>');"
- result['maxmind'] = scriptTemp % "http://j.maxmind.com/app/geoip.js"
- result['wipmania'] = scriptTemp % "http://api.wipmania.com/wip.js"
- if self.request.get("key"):
- result['google_key'] = self.request.get("key")
- result['google'] = scriptTemp % \
- ("http://www.google.com/jsapi?key=" + self.request.get("key"))
- #Get the fetched API Data
- if ipInfo.ready():
- ll.append(ipInfo)
- if hostIp.ready():
- ll.append(hostIp)
- result['ll_json'] = encoder.JSONEncoder().encode(result['ll'])
- #Put all together in the javascript template
- path = os.path.join(os.path.dirname(__file__), 'geo.temp')
- self.response.out.write(template.render(path, result))
- #get
- #GeoScript
- application = webapp.WSGIApplication([
- ('/geo_data.js', GeoScript) ], debug=True)
For more information on how to start a python google appengine Webapplication start reading the fine google documentation!
conclusion
Don't mix too many languages - you will be confused! The parallel implementation of the server side script in php and python and using one version for the advanced functions in javascript will mix three script language! My first failures have been setting some semicolons in python or forgetting the block parentheses in javascript.
After deploying and watching a server side script with the dashboard of google's appengine you get all data: Logs and API-calls in detail, you can manage many different versions (default is one) or give deploy access to other google accounts:That's great!
What is the best service provider?
I have done some caching and checked all five API results. Here is the hit rate for the location of visitors of this blog and the distance to the center from the given locations (The center is the average of all long/lat values pairs with a given city value per ip).
Service Provider | lat/long per ip | city per ip | distance to center |
---|---|---|---|
maxmind | 86% | 85% | 123 km |
WIPmania | 89% | 0% | 1059 km |
48% | 0% | 197 km | |
IPinfoDB | 98% | 91% | 168 km |
hostip | 35% | 53% | 404 km |
HostIP and google do not offer location data for many visitors. IPInfoDB and MaxMind do not have the same positions (like suggested in the comments) for all IPs. At this time WIPMania mainly offers the center of the country. So the positions are not very accurate (in comparision to the calculated center).
How calculate the distance between lat/long values?
I found some nice functions in javascript (please don't add functions to the prototype to String and Integer!!!), in python the distance function looks like the following lines:
- def distance(lat1, long1, lat2, long2):
- return 6378.7 * math.acos(math.sin(lat1/57.2958) * math.sin(lat2/57.2958) + math.cos(lat1/57.2958) * math.cos(lat2/57.2958) * math.cos(lon2/57.2958 - lon1/57.2958))
- #distance
- Login to post comments
Comments
Anonymous - Thu, 08/06/2009 - 00:41
This article has been shared on favSHARE.net.
Combining HTTP and JavaScript APIs with php | united-coders. (not verified) - Sat, 04/03/2010 - 11:46
[...] the python google appengine implementation [...]
Anonymous (not verified) - Sun, 04/18/2010 - 08:45
"c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))" can be simplified to "c = 2 * math.asin(math.sqrt(a))"
3 caching steps to boost your webservice by x10 | united-cod (not verified) - Mon, 07/26/2010 - 18:46
[...] the python google appengine implementation [...]