Friday, December 7, 2007

Lightning Brain Podcast: Click here to listen to "Refactoring ExtendScripts"

Today we'll talk about cleaning up scripts, and as a bonus, we'll work on a script that adds a 'hand-written' quality to text: changing something like this:

Before Jitter


Before Jitter

By clicking the link provided in this sentence, you can download this real-life example of a small script being refactored, from experimental form to a cleaner form - click here to download

Listen to or read the podcast transcript for more info...

On to the podcast - click here to listen to it!

So, the script works, everyone is happy - but maybe you are not done yet...

Imagine that you will have to pick that same script up again twelve months from now. How much time will it take you to 'get back up to speed' and rebuild a mental picture of the script and what it is doing? Probably it will take you many hours of browsing, debugging and fiddling around before you regain enough understanding of the script's inner workings to make a 'safe' change.

Now, what if the circumstances have changed, and the script needs to be adjusted to cope with a new environment - it might need to be moved from InDesign to InDesign Server, there might be new requirements...

How about spending some time now - when your head is still filled with knowledge about how it all works - and make sure the script becomes more future-proof.

In this podcast, I'll list a few of the techniques I personally use to 'future-proof' my scripts.

Before diving into the techniques, I need to explain how we approach ExtendScript development here at Rorohiko; we've adapted an approach that delivers very good quality at a reasonable price - we're not cheap, but as opposed to many other custom developments, our solutions work and work well.

When we're developing custom scripts, we use a 'no cure, no pay' approach, for a number of reasons.

The main reason is, that for this type of development, building a sufficiently accurate quote often costs us more than the development itself.

For an accurate quote, we would first and foremost need an accurate, extensive project brief.

But in our business we're often dealing with creative, fairly a-technical people, and we invariably found it very hard, even impossible to zoom in on an accurate enough technical description of the functionality being looked for.

On top of that, what we're asked to do is often at odds with what is really needed. We're often asked to develop a specific solution, rather than being asked to solve a problem.

From experience we've learned that it pays to dig deeper, and try and find out what the underlying problem is - quite often the asked-for 'solution' only cures a symptom, and leaves the underlying problem unfixed.

The most efficient way we found to get the technical information we need from creative people is to use an iterative approach. Instead nagging people and trying to wring a technical brief out of them, we put the cart before the horse instead, and we create something, anything, as good as we can, based on the still limited understanding we have of the problem at hand.

We present our customer with the attempted solution, and get their feedback - it is much easier for them to explain what is wrong or what is missing from some tangible, real software, rather than trying to come up with blueprint.

Based on the feedback, we adjust the solution (or throw it out and start over), and we go through a few iterations, until we have the thing sussed.

Eventually, we reach a good, smooth solution, and by that time we also know exactly what the cost of that solution is.

At that point, the 'no cure, no pay' system kicks in: our customer can choose to purchase and continue to use the software, or alternatively, in case our solution were to not live up to its promise, the software is simply destroyed, and there is no cost to the customer.

This approach works really well, but the successive iterations cause the software to go through a few swings and roundabouts, and along the way, grow a whole collection of warts if we're not careful.

We'll typically spend some time refactoring the scripts to make sure they're future-proof and self-explanatory - making a small investment of time now in return for a substantial time-saving later.

Here are some of the things we do:

1) Don't rely on app.activeDocument

While in the heat of experimentation, iteration and script development, it's easy to assume that the functionality being created will be applied to the current document, and hence to refer to app.activeDocument

However, there are sizable benefits to removing the reliance on the active document. Our scripts will typically contain a number of functions, and whenever a function uses app.activeDocument, we rework that function to not use 'app.activeDocument', but instead take a 'document' parameter.

The idea is that you get hold of the document 'under consideration' in one single spot in the script, and that from then on you pass a document parameter to any function that is supposed to work on that document.

The two biggest advantages are:

First, your script becomes a lot easier to convert to an InDesign Server environment (where there is no such thing as app.activeDocument), and second, all of your functions have now suddenly become much more re-usable: they can now also be used when the document to be modified is not the active document.

For example, the document might be a temporary, invisible document you're opening in the background - and by NOT using app.activeDocument, you can pass such a document to your functions as a parameter.

2) Test, test, test your preconditions

When using a function, it pays to add tests for preconditions - make sure all parameters being passed as what they are supposed to be, and display an error message if they are not. Whenever possible, we leave all this testing code in the script - so if something goes wrong at the customer's end we get good, specific information about where things go off the rails.

Typically, we'll have a global constant - something like kDebugging - which can be set to true or false to indicate debugging more.

We'll also add a messaging function similar to alert() which can display a dialog box with a message. The difference with alert() is that the dialog box is conditional on kDebugging being set to true. If kDebugging is set to false, the message is ignored.

And then we'll test all the function parameters being passed into a function. Is the document non-null? Is it instanceof Document? Is the percentage a number between 0 and 100? If any of these tests fail, a debug message is emitted, and the function 'bails out'. This guarantees that any unexpected condition can be caught early on.

This works well by wrapping most of the function body inside a do{}while(false) construct, which mis-uses the do-while loop to build a construct that allows a 'ladder-like' function construction.

Inside the do{}while(false) there is a whole series of if tests which verify if all is well, and display a debug message followed by a 'break' statement if not. The break causes the function to 'fall off' the ladder for any failing precondition. The debug message being displayed is specific enough to pinpoint the spot where things went wrong - it will include the name of the function where the problem occurs, and a short description of what is wrong.

This construct is quite similar to using try-catch, but it is 'cheaper' in a number of respects; it causes less overhead than using try-catch, and does not cause the InDesign Debug version to emit assert statements during script execution.

3) Do not spend time optimizing unless it is really necessary.

Now, you'd think that all that debug code from the previous point must cause a lot of overhead.

Well, turns out that is not true most of the time - a typical script will spend 95% of its time in 5% of the script, and all that debug code has very little impact on the script execution time.

In practice, we'll leave all our debug code in the script - all we might do is to set kDebugging (or whatever the constant is called) to false - but even that we often don't do: it's better to be informed of unexpected circumstances, than to have a script silently and mysteriously fail.

Only when there are speed issues might we consider removing some debugging code - but only if we can clearly see that this code is part of the bottleneck.

The current ExtendScript Toolkit contains a nice profiling tool that allows you to see where a script is spending most of its time. Our recommendation is to not bother with optimizing unless there is a time issue, and when optimizing, use proper profiling to solve the bottleneck - but nothing else. Any debug code that you can leave alone should be left alone; it's part of your safety net.

It is very common for our scripts to have 50% or more debugging/testing code in them.

4) Avoid global variables

While experimenting, it is very common and easy to introduce some global variables to keep track of things.

However, global variables can be a recipe for disaster - especially when you need to revisit an older script and make some modifications to it.

Global variables represent a form of communication between different areas of your script - functions can communicate with one another by stuffing data into global variables, and getting it back out again.

Problem is: that type of interaction is very easy to get overlooked, and causes all kinds of unexpected side effects - for example, you add an extra call to a particular function somewhere, the function changes a value of some global variable, and then other functions that rely on that same variable go off the rails.

Because functions don't clearly 'advertise' what global data they consult or modify, it becomes very hard to keep track of interactions. That makes for fun debug sessions, chasing weird bugs after making a 'tiny change' to a year-old script.

Like everyone else, during the initial phase of a project, we often start out stuffing data into globals - but unless there is good reason to, we'll rework the script and move the global variables into function parameters. If there is a lot of data, we'll introduce one or more data structures which are then passed around as a parameter.

An example: we might be parsing a text string, and keep track of where we're at in a global variable gTextPos, and store the string in a global gParseText.

During cleanup, that will be reworked (or 'refactored' as it is often called) - we'll get rid of the globals, and instead we'll put the current 'parse state' into a JavaScript object with at least two attributes: parseText and textPos.

Then we pass that object to the relevant routines using a parameter - say 'parseState'.

This way it becomes immediately clear to the human reader of the script which routines access that data (they need the parameter) and which ones don't access that data (they don't need the parameter) - it's a self-enforcing cleanup. From this moment on, each function that needs access to that data does 'advertise' the fact via its parameter list.

Imagine every JavaScript function as a gob of code floating in space. Then imagine what outside factors influence the function's operation and how, in return, the function influences its environment. There are the parameters coming in at the top, the return value coming out at the bottom. Most of the time these two relations are pretty easy to see.

Then there are any global variables that are modified or consulted by the function - using globals leaves a lot of room for unseen interaction between the function and its environment. Things like app.something and $.something are also globals - they are provided by InDesign, but they are still globals.

The more 'isolated' you can make your functions, the easier it will be to re-use in a different script.

Functions that interact with global data are like a beating heart - very difficult to transplant because there is a lot of stuff to disconnect and reconnect.

Functions that take data via their parameters, and return data via their return value and/or via some of their parameters are much easier to transplant: a few easy connections to their environment; they easily snap in and out.

5) Each function should do one thing well

We always try to create functions that do one thing well; during the 'frantic' phase of a project we often end up with functions that do lots of stuff. These multi-headed monsters need to be divvied up into smaller functions - each doing just one thing. Functions that are initially called something like 'ImportFileAndColorFramesAndDeleteOverrun' are split up into multiple smaller functions.

This increases the chances of making things re-usable - any 'good' function eventually ends up in our growing function library and will be reused, which cuts down our development time on future projects. Multi-headed monsters are never reusable - so cutting them up has distinct advantages.

6) Name constants and move them to the header of the file for easier customization

During the trial and error phase, you'll typically add all kinds of literal constants to the code - it is worthwhile to try and isolate these constants from the code and move them to a seaparate section near the top.

This makes the script easier to adjust, and it also makes it more robust.

Now, if a certain string constant is used twice in the script, there seems to be little advantage to creating a symbolic constant for the string and then use the constant instead of the literal string. Many people think this is a pedantic use of constants - on cursory inspection, the two approaches look not all that different.

However, the advantage is that with a symbolic constant, typing errors can be immediately caught by the computer, whereas with literal strings the computer would not know that these two strings are supposed to be equal.

So, if you'd type two literal strings "TextFrame" and "TextFrome", the computer would accept that - but if you typed two symbolic constants kTextFrame and kTextFrome, the second one would be undefined and cause an error.

By clicking the link provided in this sentence, you can download a real-life example of a small script being refactored, from experimental form to a cleaner form - click here to download