I read somewhere that you know you have become a geek when you start having strong opinions about text editors. While I am, I believe, still far from that, I recently discovered there are thing about which I have strong opinions: programming style. In particular bad programming style.
Bad programming style is really an issue for, I believe, producing replicable and readable code is highly unappreciated. There was recently a discussion in pol-meth about which was the best software to teach undergraduates statistics. Understandably, most people split between the Stata and the R camps. The main argument in favor of Stata, it was said, is that it is easier for students who will not get distracted at the lower level of statistics by learning R. My first reaction was one of incredulity: I really could not believe that someone could really argue that Stata was easier and better than R to teach statistics. But then I remember the first bits of code I read back when I was learning R at the introductory level. The main problem is that it was highly unreadable! But code should be at least as readable as a .tex file. Well written code is easy to debug and easy to go over.
I would like to put this in more positive terms, so here I suggest some tips I personally find useful:
- Uninformative and erratic naming conventions: naming must be informative. Apparently, some people seem to find acceptable to name their variables as “x” or “y” and then add twists such as “x1“, “x1bis“. Needless to say, this produces code that is very hard to read. Instead, vCars, fWeightedMean and similar ones are, if not straightforward, at least suggestive. You may notice that both names have one letter in front of their name. This letter stands for the kind of object they are and I find it useful, not only to recognize them, but also to force myself when writing their name again and again to remember what kind of object they are. Also, code must be easy to write and rewrite. A practice that goes against this is typically abuse of abbreviation. There seem to be out there many people who think that shorter names are better. I disagree. I personally do not mind if my data structures and functions have long names as long as they are easy to remember. If you write “myVariable“, you probably can shorten it to “myVar” but you need to be rather disciplined to rememember three hundred lines of code later that you called it “myVar” and not “myVariab” or “myVari“. I feel the convention of writing full names is generally better.
- Repetitive code: Functions and loops were invented to avoid repetition. Repetition is a source of error, since any modification needs to proceed to as many places as it has happened.
- Unstructured code: good code muy have spaces separating blocks of code, just as a good text must have paragraphs. Writing things if fewer lines often speeds up execution, but that does not mean you must enter an infinite nesting sequence. On top of that, in R you have magrittr which writes beautiful readable code . Good code should have sections and subsections; document functions with comments. etc.
- Uncommented code: This is one of my biggest beefs. Some people don’t seem to need to comment their code. I don’t need to say that it makes it totally unreadable for others, but I sometimes wonder how they can remember what each line is doing. I personally can’t.
- Not cleaning your working environment regularly: this is something specific to IDE’s. If you work with the R terminal, you are likely to try stuff to see if it works. That’s just normal. But then a ton of garbage is likely to stay there. I have found that some people feel comfortable just running and re-running smalls chunks of code here and there, never starting the script from scratch. I think one of the advantages of knitr is that it forces you to check if every part of your code works regularly. But this practice of not starting from scratch, in conjunction with uninformative naming convention (using x and y again and again),… I mean, god knows how and where from you are getting your results.