Finalizing my research, the CCELIM model, in R

For the past 6 months I’ve been working on an inverse modeling project as a ‘starter’ graduate project, and today I am announcing that I can see the light at the end of the tunnel. While there is certainly more work ahead, especially with regards to the manuscript, the model and the code are just about where I want them in terms of both functionality and efficiency. Here I’ll share the latest steps I’ve taken to really polish up the code base.

Balancing specific functionality with general utility is always an important judgement call when it comes to programing since either extreme is far from ideal yet no clear center exists. Consider a program that does everything for you, something where you tell it the data to use and it does all the rest. A program like that can make your life pretty easy, you don’t have to babysit the computer and help it through each little step.

But what happens when you need to change the workflow a little, or perhaps need to alter the conditions of the model? Suddenly the all-in-one program seems pretty complicated and the code base can get jerry-rigged with code snippets and left-overs from old versions that are no longer needed. This is when a modular program starts to look pretty nice.

In a modular program, each task is broken apart into their own little scripts and files to be run independently. This opens up obvious flexibility for the program as a whole since instead of altering the program as a whole, or perhaps altering a copy of the program, you can simply change one script or add a new script to the directory. This code starts to resemble building blocks, like legos, where changes or additions to one area are quite isolated from changes anywhere else.

While modular programming seems to be the best solution, taken too far it too leads to real, logistical issues. Imagine how many ways there are to arrange a few dozen lego pieces, now what if all those were pieces of a computer program. At some point there are simply too many pieces to keep track of and use efficiently. There’s a granularity where the program is both effectively modular while also intelligently structured, and unfortunately it’s only ever found through trial and error.

For my inverse modeling code base, there have been at least 3 major revisions where nearly every line of code has been rewritten, and I hope the latest will be the last. Since it is my hope to be able to share my code and work with others, the aim of the last revision has been to generalize as much of the code as possible while maintaining an effective workflow in my own research. What would be the point of writing code that I wouldn’t even end up using? With that in mind, I decided to look into the possibility of writing an R library, and it turns out that creating an R library is straightforward and useful. By distilling my thoughts regarding my program into a general purpose format required by an R package, the resulting code had a healthy balance of generality and utility.

There are a number of useful websites and guides available for writing an R library, so I will not recommend any in particular here (a google search will bring them up). The first, very rough version of my package can be downloaded from CCELIM_0.1.tar or the latest can be found at my github repo here.