How to Choose a Programming Language

Last modified at 2020-10-28 21:45+03:00

2020-10-28 21:45+03:00: Added some discussion of C and security

So you got to be the lead in a new greenfield project. You face the exciting and terrifying task of actually making it happen. One of the more significant decisions is choosing the programming language. Let me tell you what I know about that.

First, Avoid Choosing When You Can

Programming language choice usually becomes more difficult to change the longer a project has been running. Each day, new lines of code are written and a language change (usually) means rewriting all those lines, a very expensive investment. Thus, the choice of language is important.

There are two tricks of the trade to deal with important decisions that are expensive to change:

Avoid making them as long as possible.
Try to make the choice matter as little as possible.

The first trick means that you do not decide the language before you actually start writing some code. The second trick means allowing polyglot programming - having different parts of the project be in different languages. Microservice architectures make both of these easy: we can choose a language for each microservice separately. However, the tricks can be implemented in many other ways as well.

Sometimes It Makes Sense to Choose For Everybody

As developers, we like it when we are allowed to choose our own tools. However, in a large organization, complete freedom of choice can create a modern-day Tower of Babel. If every microservice in your enterprise architecture is written in a different language, how will you maintain them when the original developer leaves? If your only Haskell developer gets hired by another company, how will your C# developers take ownership?

I advocate developers to adopt a generalist mindset, learning many different tools and approaches as they progress in their careers. True generalists will be able to rise to fill any void left by departing colleagues, but it will take time. Your enterprise may not have that time, and you may not be able to hire a Prolog consultant when you need one to fight your fires for you.

Thus, your enterprise architecture, or your standard work for developers, may dictate (or suggest) a language. Similarly, a large development project may define a language as part of its software architecture. Just be aware the cost of making the decision, as well as its benefits.

You Have to Balance Stability And the Cutting Edge

Some languages change slowly, some not so slowly. I can still compile and run my 1997 C code on a modern Linux system; my Python 1 code from the same era will not work without changes.

Eager young developers tend to view old, slowly changing languages as dinosaurs to be avoided. In my youth, I vowed never to touch COBOL (an oath I have thus far not broken); in recent years, I have come across claims that Java software is (due to the use of Java) unmaintainable and must be replaced. I now think these ideas are wrong. (And I should take the time to learn COBOL, by the way.)

Anyway, the Java of the 1990s is not the same as the Java of 2020, and the COBOL of 1960 is not the same as the COBOL of 2020. Even slow to change languages do change.

This is another trade-off. In some situations, using the latest hip language is a marketing win; in other situations, you can expect the code you write now to survive, in one form or another, for decades. Don't be shocked: rewriting code is so expensive that working code, even if not designed for longevity, can survive a long long time.

Most Languages Are the Same

Fifteen years ago, when I was busy teaching functional programming in Haskell, it was a radical (though not so new) thing. Mainstream languages did not boast lambdas or other convenient ways to deal in higher-order functions, nor were their libraries geared around transforming stateless data. Yet, it was obvious to me at the time, and I told my students, that functional programming will be mainstream in ten years, though it will be in Java, C#, and similar languages and not in Haskell. I was right.

When you progress on your generalist's path, you will notice that most languages have a common core. There seems to be a growing consensus on what the core of a good mainstream language looks like, and variations are mostly cosmetic. Even innovative new-old features like asynchronous programming and coroutines tend to spread through the language ecosystem, each language giving its own take on it at its own pace of change.

Thus, if you only consider general-purpose languages with healthy active development communities, it does not matter that much which language you choose.

Some Languages Are Genuinely Problematic

Last year, in a customer project, I had the pleasure to learn DAX, Microsoft's domain specific language for writing business intelligence data queries over tabular data sources that is offered in Power BI, Excel Power Pivot, and SQL Server Analysis Services Tabular. In some ways, it is a very innovative and clever language, truly a pleasure to learn.

One of its nice aspects is how interactive data filters from the business intelligence application are implicitly applied over the query logic that a developer writes.

One of its most problematic aspects is how the language attempts to allow the developer control the behavior of that implicit filtering overlay. Another problematic aspect is how it looks deceptively like Excel formulas; your first reaction to seeing DAX code is likely the same as mine: I know this already! But when even the world's foremost independent experts on the language misunderstand its behavior for years, something is wrong.

Programming education researchers talk about notional machines - the models of a language's behavior that teachers give to students. Similarly, programming language experts talk about the abstract machine that language defines. Both ideas share the same root: when we write a program in a programming language, we pretend that this language is the native language of the machine, even though it (almost always) is not. The purpose of a programming language is to define an abstract machine (and allow teachers define a notional machine) that is simpler for its intended purposes than the actual machine.

DAX fails that purpose, and it (unfortunately) is not alone.

That does not mean I will never use DAX in a project. Sometimes DAX is the best choice due to other architectural decisions already made - for example, if the business intelligence frontend is Power BI, you kind of have to use DAX. But it does mean I will approach that project carefully.

Another example of a problematic language is C. I love C, but I have not written anything in it in a long while. There is a good reason: C was explicitly designed to cut out almost all safety margins. You can write really great code in C, but you are always a couple of typos away from a major security hole. Security matters in most projects more than such precision control. Therefore, I no longer touch C.

Science Is Not Helping

I would love to tell you that there is a scientific answer to this problem. Unfortunately, I cannot. While programming languages have been studied for decades by a lot of smart people, most of those studies have focused on discovering what we can do with programming languages in general; very few are trying to evaluate languages or their features in a scientific manner. It is, in fact, so bad a situation that my own study found less than a hundred of such studies published from the 1950s until 2012. There is some increasing activity, but the topic is still very much niche.