I’ve recently been building a fairly complex series of Bayesian spatial regression models in BUGS, and thought I’d share some tips based on hard won experience with the models. The various BUGS packages have the most cryptic and incoherent error messages of any stats software I have ever worked with, and although various Bayesian boosters claim that their modeling approach is intuitive, in my opinion it is the exact opposite of intuitive, and it is extremely hard to configure data for use in the packages. Furthermore, online help is hard to find – google an error message and you will find multiple websites with people asking questions that have never been answered, which is rare in the modern world. I take this as a sign that most people don’t understand the error message, and indeed the BUGS manual includes a list of errors with “possible interpretations” that reads more like the I Ching than a software guide. But Confucius say Enlightenment is not to be found in Black Box Pascal, so here is my experience of BUGS.

The models I’m running are complex, with nested conditional autoregressive structures and the higher level having more than 1000 areas with complex neighbour relationships, and millions of observations. I originally ran them on a completely hideous Hewlett Packard laptop, with 4 cores and 8Gb of RAM. I subsequently upgraded to a Dell Workstation (joy in comparison to HP’s clunky root-kitted horror) with 8 cores and 16Gb of RAM; I’m not sure that hardware is the main barrier to performance here though …

The HP machine had a secret administrator account (arseholes!) so I couldn’t install winBUGS[1], so I started off running OpenBUGS called through R’s R2OpenBUGS package running in RStudio. I use R to set up the data and initial values, because I can’t think of any other way to load millions of observations into a text file without going stir crazy. But when I call OpenBUGS it just hangs … no error messages or any other kind of indication of what is going on. I also can’t tell if it is happening at the data loading or compiling or inits stage.

Some digging around online and I found an old post by Andrew Gelman, observing that BUGS does not work well with “large datasets, multivariate structures, and regression coefficients.”

i.e. pretty much every statistical problem worth doing. Gelman also notes that “efficiently-programmed models can get really long, ugly, and bug-prone,” which seems like a contradiction in terms.

Anyway, noting that my data was large, with multivariate structures and regression coefficients, I thought maybe I should tone it down a bit so I tried using a higher level of spatial heirarchy, which reduces the adjacency matrix by an order of magnitude. Still no dice. It was at this point that I upgraded to the bigger computer.

On the bigger computer the smaller model actually worked! But it didn’t work in the sense that anything meaningful came out of it … It worked in the sense that it reported a completely incomprehensible bug, something like a node having an invalid value. I tried multiple different values and nothing worked, but somewhere on the internet I found someone hinting that you should try running BUGS directly rather than calling through R, so I tried this … having created the data in R, I killed OpenBUGS then opened the OpenBUGS interface directly and input the model, then the data, using the text files created by R[2].

When I did this I could step through the process – model was syntatically correct, then model failed to compile! Given that loading inits comes after compilation, an error telling me that I had the wrong initial value seems a bit misleading… in fact I had an “index out of range” error, and when I investigated I found I had made a mistake preparing one part of the data. So where the actual error was “the model can’t compile because you have provided the wrong data,” when called through R the problem was “you have the wrong initial values” (even though I haven’t actually loaded initial values yet).

WTF?! But let’s step back and look at this process for a moment, because it is seven shades of wrong. When you run R2OpenBUGS in R, it first turns the data and inits into a form that OpenBUGS can read; then it dumps these into a directory; then it opens OpenBUGS and gets OpenBUGS to access those files in a stepwise process – at least, that’s what I see R doing. If I decide to do the model directly in the OpenBUGS graphical interface, then what I do is I get R to make the data, then I use the task manager to kill OpenBUGS, then I call OpenBUGS directly, and get OpenBUGS to access the files R made in a stepwise process. i.e. I do exactly the same thing that R does, but I get completely different error messages.

There are various places on the internet where you might stumble on this advice, but I want to stress it: you get different error messages in OpenBUGS run natively than you do in OpenBUGS called through R. Those error messages are so different that you will get a completely different idea of what is wrong with your program.

Anyway, I fixed the index but then I ran into problems after I tried to load my initial values. Nothing seemed to work, and the errors were really cryptic. “Invalid initial value” is not very useful. But further digging on the internet showed me that OpenBUGS and WinBUGS have different approaches to initial values, and winBUGS is not as strict about the values that it accepts. Hmmm … so I installed winBUGS, and reran the model… and it worked! OpenBUGS apparently has some kind of condition on certain variables that they must sum to 0, while winBUGS doesn’t check that condition. A free tip for beginners: setting your initial values so they sum to 0 doesn’t help, but running the same model, unchanged, in winBUGS, works.

So either OpenBUGS is too strict, or winBUGS lets through a whole bunch of dodgy stuff. I am inclined to believe the former, because initial values shouldn’t be a major obstacle to a good model, but as others[3] have observed, BUGS is programmed in a completely opaque system so no one knows what it is doing.

So, multiple misleading errors, and a complex weirdness about calling external software through R, and I have a functioning model. Today I expanded that model back to the original order of magnitude of small areas, and it also worked, though there was an interesting weirdness here. When I tried to compile the model it took about three hours, and produced a Trap. But the weird thing is the Trap contained no warnings about BUGS at all, they were all warnings about windows (something called Windows.AddInteger or similar), and after I killed the Trap my model updated fine. So I think the compile problems I previously experienced may have had something to do with memory problems in Windows (I had no problems with badly designed adjacency matrices in the larger model), but OpenBUGS just doesn’t tell you what’s going on, so you have no idea …

I should also add, for intrepid readers who have got this far, that this dude provides an excellent catalogue of OpenBUGS errors with his plain English explanations of what they actually meant. He’s like this mystical interpreter of the I Ching for Bayesian spatial regressives. Also I want to add that I think the CAR spatial correlation model is super dodgy. I found this article (pdf) by Melanie Wall from the Journal of Statistical Planning and Inference (what a read!) that shows that the way we construct the spatial adjacency matrix is the primary determinant of the correlation structure, and that the correlation structure determined by this adjacency matrix is nothing like what we think we are getting. Today on my whiteboard and with the help of R I imagined a simple industrial process where each stage in the process is correlated with the one before and after it, and I showed very easily based on Wall’s work that the adjacency matrix required to describe this process is completely different to the one that you would naively set up under the framework described for CAR modeling. So I think most of the “spatial correlation” structures described using CAR models have no relationship to what the programmer thinks they’re entering into the model. But I have no proof of this, so I guess like everyone else I’ll just press on, using the adjacency matrix I think works …

So there you have it. Next time you see an opinion formed on the basis of a spatial regression model built in BUGS, remember the problems I had getting to the output, and ask yourself – do you trust that model? Really?

fn1: Well, I could copy winBUGS into the program files folder but I couldn’t patch it or install the immortality key, which, wtf? When I bought Time Series Modelling and Forecasting by Brockwell and Davis, ITSM came as a free disk with the book. When I buy the BUGS book I get to install software that comes with a big message telling me to get stuffed, and years later they finally provide a key that enables you to use it for free …

fn2: If you aren’t aware of how this works, basically when you call OpenBUGS in R, providing data from inside R, R first dumps the data into text files in the directory of your choosing, then OpenBUGS opens those files. So if you aren’t comfortable preparing data for BUGS yourself, use the list and structure commands in R, then kill OpenBUGS and go to OpenBUGS directly … the text files will remain in the directory you chose.

fn3: Barry Rowlingson does a pretty good job of showing how interesting and useful spatial analysis can be: see e.g. his post on mapping the Zaatari refugee camp in Syria.

Advertisements