I’ve been burnt twice in the past two weeks by a strange graphics handling problem in Stata and Microsoft Office. In the spirit of presenting workarounds and warnings for obscure software problems that I stumble upon, I think I should report it here.

The basic problem is simple and very nasty: charts produced in Stata and then exported into Microsoft proprietary formats don’t work properly across platforms and, possibly, across machines. The way in which they fail is insidious, as well, because it looks as if the operator has made an error: axis titles disappear, or parts of the graph are shaved off so that the graph doesn’t match the description one has written in text. Worse still, the person who originally made the chart can’t see the error, and it doesn’t appear in printouts from the afflicted person’s work. This means that you can’t easily convince the person at the other end that you’ve not done something wrong.

As an example, consider this insidious cock-up from this week. One of my students sent a draft paper to a colleague last week, and he sent it back with the cryptic message “fix the charts.” We didn’t know what he wanted changed, so we changed a few things and sent them back. This week we received an angry reply, demanding that we fix the charts and specifically why did we forget the y-axis labels? The day that we fixed the charts, we were working on printouts, because we were rushing, and the y-axes were in the printouts – I had a distinct memory of correcting some text in the y-axes. So I asked my student to mail me the last version he’d sent to the colleague, thinking he’d stuffed up, and indeed I couldn’t see the y-axes in the charts. I asked him why he’d removed them after I painstakingly corrected them, and he told me he hadn’t, and he could see them – but by now he was overseas and I couldn’t check in person. So I forwarded the document to my partner, who works on a PC, and she could see them. What was going on? My colleague and I, on macs, couldn’t see the y axes, but my student and my partner, on PCs, could. Weird.

I asked my student how he had put the graphs in word, and he told me he had copied the figures directly from Stata and pasted them into word, essentially following instructions that can be found all over the web (for example here) and also, I think, in the Stata help. I did some digging and discovered that when you do this, the file is converted automatically by Office into a new format – possibly .wmf? – and this can’t handle all of Stata’s graphics rendering; this leads to approximations in the encoding of some aspects of the graph. Mac graphics are handled in a different format – possibly .eps? – and the badly rendered parts of .wmf files are simply ignored when it opens them. One of the main things that the .wmf rendering stuffs up is rotated text – such as one finds in y-axis titles. When I realized this, I asked my student to redo the files by saving as .png, and everything was fine. The .png files looked hideous though so we redid them in .tiff format, but we could at least see the details of the axis labels now.

I’m not sure, however that it’s just a platform issue. A few weeks ago I had a strange graphing problem with a journal, who mailed me to say that my text and the histograms I had provided didn’t match – specifically, parts of the range of values I had referred to in the text weren’t appearing in the histogram. I couldn’t understand this, because I could see the histograms clearly. I thought perhaps they were just being a bit weird, so I sent them hi-res images with an explanation, and they were fine. The original file had charts in it as .png files – I had included them as .png because they are low-res files, easy to produce, and a lot of journals like to receive low-res files until the production stage. But the hi-res files I sent were in .tif format. In light of what happened this week, I think that the same problem my student had also arose with the .png files in that article. I don’t know what platform the journal production staff were using, but I made the .png files on a mac. So it’s possible that the problem also arises in reverse using .png files, or it’s possible that it occurs across machines as well as platforms.

The problem with this issue is that it is insidious, and when one works across email it’s impossible to work out what is happening. It also leads to questions about professionalism – leaving out y-axis labels is pretty shoddy undergraduate stuff – and those questions are exactly the kinds of issues that people try to blame on technical problems. It also creates conflict, because if you are repeatedly sending graphs that don’t work to a colleague (or a journal!) they start to get pissed. As do you, because you start to think they’re behaving like dickheads. The worst possibility is that, if everyone in your institution is working on word, and the peer reviewers are, but the production staff at the journal are working on macs, they may produce a final published version of your article that has no axis titles. Anyone reading that will think you are incompetent, when in fact it was purely a technical problem.

The simple solution to this is:

  • never copy and paste from graphics to word (this also reduces the risk of loss of resolution)
  • don’t use .png or .wmf exports
  • only work with .tif or .eps files
  • if you get into a weird situation where you’re sure that you supplied the right file, don’t assume the other person is doing something wrong – check what platform they’re using and try sending a file in a different format

Preparing charts for journals can be a real hassle, and journals can be both simultaneously picky about their figures and singularly unhelpful in advising non-experts on how to prepare them. This kind of cross-platform (and cross-format) silliness is really unhelpful in the production process, and it’s extremely difficult to find definitive information about it on the web. These problems don’t just arise from copy-paste laziness either, and understanding the details requires delving into the world of graphics rendering – a world that many people who work with stats and scientific data don’t know much about (nor should we have to).  Stata and Microsoft and Apple all seem to be fairly silent on the issue, too. So be aware of it, and be ready to defend your work on technical grounds when colleagues or journals seem to be talking about a graph or figure that you’re sure has no resemblance to the one you sent them.

And if you’re reading this, Bill Gates – hurry up and move to a non-proprietary graphics handling format!

Advertisements