Bringing Large Geospatial Datasets to the Web
Last week Dale wrote about the problems surrounding large geospatial datasets. That got me thinking about large geospatial datasets from another angle, delivering them to the web browser.
The web has evolved over the past few years at a ferocious rate and the applications people are now building are phenomenal. The shift within the geospatial world to the web has also been dramatic with the web application frameworks such as ArcGIS Server JavaScript API, Bing, Google Maps and OpenLayers bringing spatial capabilities to the fingertips of normal web developers. Yet as a GIS professional, I still don’t live in my web browser like I do in other areas of my work. I think one of the main issues holding back GIS applications moving fully to the web is the size and complexity of the datasets we work with.
Google was the first mainstream company to bring large geospatial datasets to the web when they launched Google Maps back in 2005. Google Maps delivers the data seamlessly to the user by serving up many small tiles, the client (web browser) then reassembles them into one big image at the end. All other mapping frameworks have since used similar methodologies.
Serving static map tiles is very fast but just doesn’t cut it for the GIS professional as we often want to see and manipulate our vector data (points, polylines and polygons). Displaying vector data as overlays on our mapping framework of choice is easy with several commercial and open source solutions available which allow you to create a mapping service and free your data as a web friendly format such as KML or GeoRSS. The problem arises once this data reaches the web browser.
The limitation actually lies with the way the web browser stores and manipulates the data. When we build complex web applications we rely on JavaScript (you can use Flash, Java, etc too – but that’s a whole other discussion) to add the data to the map and manipulate it. JavaScript has very limited access to processing power and we always have to work with the weakest link which is usually an old version of Internet Explorer (comparison of JavaScript processing times).
Let’s run through a quick example, say we want to overlay 10,000 points on our map (0.00001 times the size of the datasets Dale was talking about last week). With this number of points, performing any operation on one of the point geometries would require looping through the points and an older browser with a slower JavaScript engine would likely give you the dreaded script timeout error.So we have established that working with large datasets in the web browser is difficult. However, there are several things we can do on both the client/server-side to help abate the issue:
- Only return data that the user needs to see.
This is the single most important point. When requesting the data the client should send back its current bounding box and the service should be configured to read the bounding box and return the data within/in immediate proximity of the box. You usually need to attach the client request to the mouse move event on the map in some way. Adding the zoom level to the request is also usually required (or you can configure this on the server), as if you send back a bounding box covering the entire of North America, you don’t want to return all points within that area. Instead you may want to just show one point per state if it contains any data and style it differently.Another way to limit the data returned is to allow the client is to pass filter parameters back, such as ID or feature class, which help further narrow down the selection.
- Only return the exact attributes you need.
Whatever you return to the client, the browser is going to have to store in memory. Therefore, the less you send back the better. Make sure you strip out all unrequired attributes; do your users really need to see the OBJ_ID column? - Simplify the geometry.
If you are working with polylines or polygons you may want to think about simplifying your geometry. For example, if you have the road network for the entire state in the database derived from survey points you likely don’t need this level of detail for overlaying the road on a map. Using an interpolation technique such as generalization can reduce down the number of points considerably.
In my personal opinion the web is still a long way from being able to handle extremely large datasets (though companies like WeoGeo are doing some innovative things). Using the techniques outlined above,you can however allow your users to work with large datasets by ensuring they are only working with a small sub-set of data at any one time. In my next installment we will look at emerging technologies in HTML5 which will help us bring large datasets to the web and mobile devices.
These are the main techniques I use when bringing large datasets to the web, does anyone else have any other methods?
Related posts:





Have you tried multi-resolution vector streaming? See http://www.idelve.com
Stewart:
Always a germane topic. My sense is that geo devs get caught up in how many features they can display–”check it out: 10,000 clickable points!”, without giving due thought to how or why a user can meaningfully interact with 10K points at any given time.
But this issue is relevant to the larger information visualization community where much thought has been expended. An outstanding recent article touched on this very point–
http://fellinlovewithdata.com/guides/how-do-you-visualize-too-much-data
I especially love the ‘Visual Information Seeking Mantra’ at the end–
Overview First, Zoom and Filter, Details-on-Demand
Giving this mantra 15 minutes of thought at the outset would improve the quality of user experience in our mapping apps exponentially.
Brian Timoney
- Brian
Thanks for the link and comments, a very interesting article. I think the Details-on-Demand component is an extremely important point. Delivering a minimal amount of information to the user is a win for both the user and the client/server architecture. This goes for both attribute information and geometry. An example of this I used recently was to represent complex polylines as points; I then requested the single polylines from the server when the user hovered over the point.
Another win for the user when we adopt the: Overview First, Zoom and Filter, Details-on-Demand technique is that the page load times are likely to be considerably faster which is all part of the data visualization experience.
- Steve
Thanks I haven’t tried that no, I will take a look.
Great blog post… Generalizing has proven challenging based on the expectation of the user. Careful attention should be paid to scale vs amount of generalization. I believe most traditional gis users are still immature in the display of data on the web. Utah has done some great work (carto and speed) with their basemaps. I believe the issue of rapidly updating data within optimized web displays remains an issue, though others are much more experienced than I.
Thanks for the input Learon. Utah are indeed doing some great things with their mapping services. Check out this parcel viewer here: http://mapserv.utah.gov/rasterindicies/Parcels.html which gives you a feel for the speed in which they are serving data up. Notice how they turn layers on/off at different zoom levels.
Just can’t get the goal. The interface, which is web in this case, is the point of interaction with human. Why do we need to overload a person with information? If it is all about interaction speed – we should consider a time to load that 10k points first, than handle them in browser gently. I’d recommend to find a way to load less.
Anyway, I haven’t seen an openstreetmap change set effort – which allows you to get from the server up to 50k of points, modify them and then put back. But the amounts of data are big…
Some good points Oleg. What is the maximum amount of time you would expect your users to wait when building a web mapping application. 0.5 of a second, 1 second?
If its points or bounding boxes, you can pretty easily do the following:
1) Draw a picture of your geometries (AGS, MapServer, whatever)
2) Use an r-tree to find which geometries the mouse is over (on mouse-move)
http://mapwrecker.wordpress.com/2008/11/03/json-rtrees-part-2/
Very interesting Bill, thanks for posting. I haven’t run into this before and will have to have a play. Have you got a working version hooked up to a mapping API displaying thousands of geometries?