Greetings FME’ers,
You might remember a blog article from last year entitled “Code Smells and FME Golf” for which the FME golf part was a workspace challenge that had occurred to me. You all rose to meet the challenge and submitted a series of entries, but I put them to one side for a while and never did get back to them (sorry).
Anyway, a train journey from Vancouver to Winnipeg is an excellent time to do some FME golfing so, while watching the scenery pass by, I finally worked on the problem myself and reviewed what you had all done.
This post is the first of two; in this post I’ll talk about the concept and what I did to achieve the goal, in the second post I’ll list all of the strategies users came up with and who had the best result!
The Contest
So FME golf comes from the concept of Code Golf the idea of which is to write the smallest amount of code to carry out a given task. For example this snippet of code:
!(y%100<1&&y%400||y%4)
…is (so they say) Javascript that determines whether a given year is a leap year or not. You couldn’t get much smaller than that. In fact, you couldn’t get further away from the concept of well-designed code either!
For FME I made a workspace available and challenged everyone to reduce that workspace in size (measured by the number of bytes) to as small as possible.
The rules:
- The aim is the smallest workspace file (number of bytes) possible. It’s not just the fewest transformers.
- You can use Workbench only. No manual editing of the fmw file contents is allowed (so don’t open the file in a text editor and strip out spaces)
- No Python or Tcl scripting is permitted – it must be pure FME. However, you may use FME functions if you wish.
- You can’t edit or manipulate the contents of the source data before it is read into FME
I thought it only fair to see how well I could do too – and the long time since I issued the challenge ensured I couldn’t remember the ideas I had at the time and was coming at the problem from a fresh perspective. I deliberately didn’t check anyone else’s solution until I had created my own.
So, let’s see what I did…
My Solution
At the time of the challenge we were using FME2015. Now we’re up to FME2016.1. So the first thing I did was to open the workspace in 2016.1 and save it with that version of FME. The result – the starting point of the challenge – was a file 75,835 bytes in size.
That’s slightly larger than it was in 2015, but that’s explained by extra parameters (like Writer Order and Python Compatability) being stored in there.
Most changes I made were predicated on the fact that FME is storing information in the workspace file and the smaller in size I can make that information the smaller the workspace will be. Open the fmw in a text editor (although don’t edit in there!) and you’ll see some of the things you might change.
File paths were one obvious fix, as were file names, transformer names, and feature type names.
Here’s a table of what I changed, in the order I thought of it. You can see how I was fighting for each individual byte
Starting workspace | 75,835 bytes |
Renamed source files to A.csv and N.kml | 74,680 |
Moved source datasets to drive root (C:\) to reduce path names1 | |
Renamed transformers to A,B,C,D,E | |
Renamed output attributes to A,B,C | |
Renamed output feature type to X | |
Replaced VertexCreator transformer by using CSV reader parameters | 72,280 |
Deleted unwanted user/published parameters | 70,404 |
Reduced Max Features to Log/Record parameters from 200 to Not Set2 | 70,395 |
Hid unnecessary attributes in the reader feature type | 66,949 |
Changed the writer output folder to drive root | 66,917 |
Reduced Writer schema data types (from 255 chars to 14, 50, 50 chars)3 | 66,911 |
Change other FME parameters from Yes to No (where it had no effect) | 66,909 |
Expand all the attribute lists on the canvas! | 66,714 |
Moved canvas objects closer to each other4 | 66,704 |
Changed Tester mode from Automatic to String | 66,703 |
Changed test from != to = and reversed the passed/failed ports | 66,701 |
Turned on source schema editing, and removed unwanted source attributes5 |
66,026 |
Turned on feature merge for the KML reader | 66,012 |
Turned on feature merge for the CSV reader | 66,008 |
Moved the final workspace to the drive root | 66,002 |
Woot! From a starting point of 75,835kb to a final version of 66,002kb. The workspace still ran the same and produced the same data, but it was about 15% smaller afterwards. It certainly doesn’t meet best practice, and it’s baffling how or why some changes work – like why should expanding all the attribute lists on transformers make the fmw file smaller? In fact, I did the reverse of that at first, thinking closed attribute lists would be better.
Notes:
- KML needed a subfolder in its input path (else it crashed FME!) so it actually ended up as C:\n\n.kml (I filed a report on that)
- Although “<not set>” appears to be more characters than “200” it is recorded as an empty string in the fmw file
- Reducing the output schema char sizes was nothing to do with the data, just the number of characters recording the schema (14 is one fewer char than 255)
- Moving objects closer to each other was interesting. I guess FME records the positions and extents. More experimentation – like moving everything on top of each other – didn’t help. I guess if you really knew how it worked (Dale!) then you might be able to come up with a more byte-saving layout.
- Hiding attributes was good. Removing them was better. The source CSV schema can’t be edited though
Things that I tried but weren’t helpful:
- The Tester “Has Value” test. It was worth a try, but “Has Value” is obviously a longer string to store than “!=”
- Replacing the PointOnAreaOverlayer with a Clipper. It was a toss-up which transformer would take more space; the Clipper did.
- Workspace zoom is recorded in a workspace. I had it under 100% already because of my laptop’s screen size, so reducing it had no effect. If I could have zoomed to 9% or less it might have saved a single character, but FME won’t let you do that. 10% is the minimum.
So, there we go. I don’t recommend doing this for a real project – it certainly isn’t about to appear in an FME training course – but it was fun and a very interesting insight into how FME records certain information.
For a geek of my age, counting bytes reminded me that – reduced as it is – at 66kb my workspace is still too big to fit onto an old Commodore 64 (and over twice as large as I could store on my Dragon 32). How times have changed!
Anyway, in a second post I will go through all the user submissions, mention the techniques they used, and report who produced the smallest workspace (and if it beat mine!)
Oh, and we might just have another challenge for you to try too!
Mark
Mark Ireland
Mark, aka iMark, is the FME Evangelist (est. 2004) and has a passion for FME Training. He likes being able to help people understand and use technology in new and interesting ways. One of his other passions is football (aka. Soccer). He likes both technology and soccer so much that he wrote an article about the two together! Who would’ve thought? (Answer: iMark)