The Safe Software Blog
Author:
Mark Ireland

Google
Get the Blog Newsletter

Delivered by FeedBurner

   |   May 5, 2008   |   By Mark Ireland

FME Evangelism Weekly Issue #10

Contents

  1. FME 2008 Release Updates
  2. AutoCAD Map 3D Object Data Reading Examples
  3. Shape Datasets and the Age of Innocence
  4. Google Spreadsheets Reader/Writer
  5. Creating a Shape Index with FME
  6. OIDs in PostGIS
  7. Measurement Unit Converters

1) FME 2008 Release Updates

The first of the (previously mentioned) FME2008 Update builds is now available for download. Remember, these are minor fixes made available in lieu of an FME2009 beta.

You can download the first FME2008 update online (it’s now the default 2008 download) at:

http://www.safe.com/support/downloads.php

You can find a list of the changes in each particular build (this one is build 5200) on fmepedia at:

http://www.fmepedia.com/index.php/FME2008_Post-CD_Fixes

I’d suggest you read the list of updates before automatically installing this build, since only a very small minority of our users would be likely to find any benefit in the new build; but if you do have any questions don’t hesitate to contact support@safe.com

2) AutoCAD Map3D Object Data Reading – Example Workspaces

AutoCAD Map3D Object Data is a very flexible and open-ended format, which is great for you users but a challenge for us when creating the reader.

What we did was add one fairly unusual setting – a reading mode (right) – that affects how the source schema is displayed in your workspace.

Is essence you can get different views of the data model depending on what actions you want to carry out on the data.

To explain the different modes we created a set of example workspaces, based on the same dataset but each in a different reading mode.

These demonstrate the different results you can obtain with each mode and help explain when you would want to use them.

See the examples on fmepedia at: http://www.fmepedia.com/index.php/AutoCAD_Map_3D_Object_Data_Reading

3) Shape Datasets and the Age of Innocence

The Safe support team has just received their first Shape dataset more than 2GB in size. As Dale puts it, the age of innocence is over. The content below explains why.

NB: If your eyes glaze over at technical details then skip the bits marked with ###.

What is the Problem?

In theory a Shape dataset greater than 2GB in size is – if not impossible – certainly very implausible and difficult to handle. It should be noted that when I refer to a Shape dataset I mean the .shp part (the .dbf part has it’s own 2GB limit which is another problem by itself…)

### In technical terms this is because internal pointers between the index (shx) and spatial data (shp) are usually stored as signed 32-bit integers, so any size over 2^31 (2^31=2147483648 or 2GB) would need a larger number than that can store. So a computer application writing Shape data would find its counter overflows at (2^31)+1 and – if it didn’t crash at that point – start writing invalid pointers. ###

However, any software that writes a Shape file might ignore the problem, and just churn out bad indexes, which is how invalid datasets more than 2GB in size come to exist.

What does FME do?

When FME reaches the 2GB maximum it too ignores the problem (we were obviously still innocent when we implemented it) and continues to write data. However – while that leads to bad index pointers, they are bad in a way that is predicatable. So we made a change to our reader to handle this and now (in FME2008 build 5199 or greater) FME is able to read back any oversize datasets created by FME. We’d also be able to read Shape datasets from other naive applications that wrote data in the same way.

That sounds like a workaround. Is there anything else you can do?

Good question. ### Getting technical again, Shape internal pointers are actually measured in “words”. A “word” is a unit of data that depends on a computer’s architecture. For backwards compatibility reasons a 32-bit computer usually uses 16-bit words. So FME counts using a 32-bit signed integer but then divides by two to get the correct pointer. If you’re following along you might now say, “Ah! Since you divide everything by two, there’s no reaon why you can’t double the size of the pointers”; and you’d be right. We just can’t do the counting using a 32-bit integer, as an application usually would, because that would overflow at the 2GB mark. ###

So a future version of FME (FME2009, build 5525+) will be able to write up to 4GB Shape (again that’s .shp not .dbf) files. A fixed limit of 4GB will be hard-coded so we don’t get datasets any larger, which really would be invalid.

Not only will FME be able to read these 4GB datasets back, we believe the data is the format most likely to be readable by other applications.

Any problems to look out for?

Yes! Technically our theory is sound – and not contrary to the official Shape specification – just complex and relatively obscure. I don’t know of another application that is using this technique so we can’t guarantee that these applications (i.e. those still sweet and innocent) would be able to read our oversized datasets correctly, nor can we guarantee that we can read an oversized dataset created by another application (though even if they’ve created invalid data, if it’s in a predictable way, there’s a good chance we will).

Will a 64-bit computer support 4GB+ files?

Yes, but it would need a change in the Shape specification, along the same lines as recent changes in the TIFF format.

The TIFF format also had a size limit [### 4GB, so presumably they were using unsigned 32-bit pointers as opposed to signed with the 2GB-limited Shape ###] and so a new specification – BigTIFF – was created to allow Large File Support (LFS). Similar updates would need to occur to the Shape format to permit files greater than 4GB in size.

### So with (unsigned) 64-bit pointers we get 2^64 = 17 million TB – that’s Terabytes not Gigabytes folks, and even the most avid data collector is unlikely to have a Shape dataset that size. To put it in perspective, this is equivalent to 16,384 Petabytes (apparently Microsoft’s Virtual Earth imagery dataset is 14 petabytes in size) or 16 exabytes. According to wikipedia 16 exabytes of disk space would cost you $3 billion at today’s prices! ###

For the techno-confused amongst you (that includes me) here are some useful explanatory articles on wikipedia:

http://en.wikipedia.org/wiki/Large_file_support

http://en.wikipedia.org/wiki/Gigabyte

http://en.wikipedia.org/wiki/Word_%28computing%29

Now if you ever appear on Jeopardy you can feel confident in saying, “I’d like Shape datasets for one thousand please Alex”.

4) Google Spreadsheets Reader/Writer

If you weren’t aware, Google makes a Python API available for reading and writing Google data – ie data on any of the Google web services such as Documents, Picasa, YouTube (I didn’t even know that was a Google service), Calendar, etc

Pro-services dude Aaron has taken advantage of this – and FME’s ability to run Python scripts – to create an FME reader/writer (actually custom transformers) for Google Spreadsheets.

There are two versions of the Writer; full and fast.

The ‘full’ version writes features one at a time. It is slow but enables the use of multiple worksheets in a single spreadsheet.

The ‘fast’ version writes to a CSV file and uploads it as a new spreadsheet (but cannot write to an existing spreadsheet).

This is another great example of using FME to access web services.

For more details get the download from fmepedia at:

http://www.fmepedia.com/index.php/Google_Spreadsheets_Python_Read-Writelet

5) Creating a Shape Index with FME

A frequent question to the FME support team is, “How can I create a spatial index when I write a Shape dataset?” That’s something that previously wasn’t thought possible with FME, but now a combination of Python script and ArcObjects will let you do this.

How important would it be to have indexing when your Shape file is 4GB in size, eh?!

The key is that an ArcObjects Python script can index Shape files, and FME can run Python scripts passing into it such useful values as ‘destination dataset’. So by adding such a script as a Shutdown Python Script, it’s possible to create a spatial index for any dataset you have just written.

You can find an example workspace demonstrating this technique on fmepedia at:

http://www.fmepedia.com/index.php/Spatial_Indexes_for_ESRI_Shape_datasets

It’s simpler than it sounds because you just need to copy the Python script into the Shutdown script setting dialog (Navigator Pane > Workspace Settings > Advanced) and away you go. You can use the script in our workspace, but what’s really nice is that you can export a Python script from almost any tool in ArcToolbox and use that to do any number of things!

A few quick notes:

  • You’ll need Python installed to run this (unlike TCL it isn’t included with FME). It can be found here.
  • You’ll need ArcObjects installed (or at least the ArcGIS scripting library), which means you need an ESRI product such as ArcGIS.
  • We didn’t get great results with Python v2.5 and ArcGIS v9.2, so we’d suggest using Python v2.4.
  • A minor FME/Python fault means you can’t run a shutdown script without a startup script (at least not one that accesses parameters). So make sure to create a startup Python script in your workspace, even if it is empty.

No promises, but this functionality may even find its way into FME as a properly supported parameter! Remember, you heard it here first.

6) OIDs in PostGIS

Here’s a question and answer on the FMETalk user group that is probably worth passing on.

On 01/04/2008, Jeff wrote:

> Hi there,

> Could someone point out how to get FME to see the postgis oid column

> (created by FME when you choose ‘with oids’ in the format parameters)

> in the source features?

On 02/04/2008, Roland wrote:

> Morning Jeff,

> You can put in your own select statement in the feature properties.

> Under the ‘Parameters’ tab you’ll find inputs for WHERE clause and SELECT statement.

> In the SELECT box type in something like: select oid,* from <table_name>

> Attach an AttributeExposer with [an attribute] called ‘oid’ so that you can see it, and you’ll have OIDs (sounds painful).

Thanks to Jeff for posting the question and Roland for answering it; plus thanks to everyone else on the user group who participates in the FME community in this way.

7) Measurement Unit Converters

Safer Aaron Koning has really been busy (at salary review time – coincidence?!) – he’s also produced a series of custom transformers that translates length, area or volume attributes from one set of units into another.

This is a nice example of lots of things – using a Custom (Independent) published parameter, creating a custom transformer, use of the ValueMapper transformer, etc. I think I will add these to the FMEData sample dataset.

It’s not just yards or metres that it will convert either; there are lots of different units such as miles, chains, hectares, acres and six different types of foot measurement!

So if you ever wanted to convert your LengthCalculator results from your current units into twips, light-years or angstroms, here’s your chance!

Check out fmepedia to get these great custom transformers:

http://fmepedia.com/index.php/Measurement_Unit_Converters


Brief Notes

  • The Spring 2008 edition of the FME Insider newsletter is now available. Apologies to the southern hemisphere for whom it is anything but Spring!
  • An Austrian FME user conference will take place in Vienna on 20th May 2008. Click here for more details.
  • Interested in Mastermap and/or SQL Server Spatial? See this blog posting for a great example that uses both.
  • A month or two old, but here’s a blog posting about Don Murray’s visit to a conference on Geospatial Data Standards.

This week’s Weekly was written to the tune of…

Gerry Rafferty’s superb album North and South.

It’s my favourite of his albums, but relatively unknown. The only track I can find on YouTube is “Nothin’ Ever Happens Down Here“.