FME Blog

GPS Meets Video. Do We Need a Standard for Geotagging Videos?

Stewart Harper

April 11, 2012•5 min

Photo and video have both been around for a long time. Recently we have seen interest in locating where photos were snapped, this is called Geotagging. There is a well...

Photo and video have both been around for a long time. Recently we have seen interest in locating where photos were snapped, this is called Geotagging. There is a well developed standard for storing location in photos using Exif. As a result of this standard, all photos taken on cameras with a GPS can be uploaded to services such as Flickr and geolocated on a map without any intervention from the end-user.

While the geolocation technologies surrounding photos has matured, the same cannot be said for videos and this is what I want to focus on. Do we need a standard for geotagging videos? I think the answer is definite Yes!

Review

Geotagged Videos Videos are much more difficult to deal with compared to photos. Photos are a snapshot in time which one point represents well. Videos on the other hand span longer periods of time, a point represents where the video started or finished well but not the part in-between.

I have to admit I am no expert on the inner workings of video formats, but i do have a good idea on what capabilities I would like to see. Having done a bit of research there seems to be a number of ways that location is associated with video:

Geotagging: Similar to photos, a point is used to locate the starting point of the video. I took a look at the Exif information for video produced by the iPhone, and it contains location information for the starting point of the video:

—- Composite —-

Avg Bitrate	: 776 kbps
GPS Altitude	: 34 m
GPS Altitude Ref	: Above Sea Level
GPS Latitude	: 49 deg 15′ 38.52″ N
GPS Longitude	: 123 deg 4′ 24.60″ W
GPS Position	: 49 deg 15′ 38.52″ N, 123 deg 4′ 24.60″ W
Image Size	: 480×272
Rotation	: 90

Contour GPS Camera

GPX Syncing: Devices record a GPS track at the same time as recording a video. Applications then consume the GPS track and video, since the time matches on the GPS and video you can tie the two together.
Native Storage: Contour have a helmet camera with a GPS built in which actually stores this information within the MOV file using the NMEA format rather than maintaining it in a separate GPX file.

Proposal

Contour is moving towards what I believe we need for video. A standard which allows us to very easily tie together individual frames with location. On a more technical level, I want to be able to programmatically extract the latitude and longitude for any frame in the video. If we had that then it would allow us to do some amazing things (thanks to Dave Campanas for these ideas):
Use Case for Geotagging Videos

Spatial Queries: Ask questions like “Which videos are under 10km in length?” or “Which videos have a section shot within this bounding box?”. I think this is really important as one of the issues with video is the volume of data that is produced. If the whole video was spatially aware you could search and organise your videos much more efficiently. For example, people unfortunately decided to riot here in Vancouver last year after their beloved hockey team lost. In order to identify suspects the police encouraged people to send in photos/videos of incidents. With over 1 million submissions searching must have been a timely process. But with fully georeferenced videos they could have queried which videos fell within a certain location they were investigating.
Geo clipping video by area of interest. You would now not only be able to clip a video by time, but by location or distance. Continuing on with an example above you could query all videos in the area you are investigating and then clip the video just to that area too. As a result you would have to review only spatially relevant video.
Select/relating to other objects: Add a geographic buffer around a video and select objects that are within a proximity of where the footage was recorded. This could add huge value, allowing you to simply merge together your existing spatial data accurately with video. An example would be to select all organisations (so you can draw up a list of contact details) that were in proximity of videos containing vandalism.
Extract individual frames: Extract closest frame/clip to a target point from multiple videos. An example would be if you know where an individual had committed an act of vandalism from a photo, extract the closest frames from the videos surrounding the incident.

Someone at OpenStreetMap has also seen the value in adding location to the entire length of video. The article walks through their attempt at video mapping using the Contour GPS. This is another great use case.

Conclusion

In my view the hardware is here, so let’s wrap a standard around it so we can start building awesome applications! It would be a missed opportunity if all of the different device manufacturers use their own methods to georeference video. Interoperability would be destroyed and will make it difficult for applications to spring up that wrap innovative solutions around what could be a powerful piece of technology.