There has been a lot of hype lately around on tools such as OpenAI Codex being able to replace human coders. The idea is that a non-programmer could just tell an AI application what they need an app to do, and the ‘intelligent’ tool will create the required software for them. Thus, negating the need for expensive, highly skilled, developers!
How realistic is this? In this blog we will aim to answer the question “Will OpenAI Codex render human coders obsolete?”.
To do this we will first need to understand what OpenAI Codex is, what it can (and can’t) do and how it can be used (as exemplified by CoPilot).
Who are OpenAI?
You probably won’t be entirely surprised to learn they are an AI research organisation based in California. OpenAI were founded in 2015 by several people including Elon Musk (although Elon Musk resigned from the board in early 2018 citing potential conflicts of interests with what Tesla were doing). The company was originally set up as a not-for-profit but has since transitioned to a “capped” for-profit company. In 2019 Microsoft invested $1bn in the company and is a keen supporter and user of its technology.
What is OpenAI Codex?
OpenAI Codex (or Codex for short) is a tool (and a set of public APIs) that can be bused to convert natural language into source code.
Codex is built on top of a tool called GPT-3 (see below) and has successfully been interfaced to services and applications such as MailChimp, Spotify, Microsoft Word and Google Calendar.
Codex is also the underlying technology behind the enhanced code completion tool CoPilot, which is used alongside GitHub and is similar to other autocompletion tools in that it offers suggestions on how to complete lines of code. It is an enhanced tool in that it can also provide code snippets or complete solutions for requirements expressed in natural language.
Will OpenAI’s Codex render human coders obsolete?
What is GPT-3?
No prizes for guessing GPT-3 is the successor to GPT and GPT-2. It is a neural network-based system to handle Natural Language Processing (NLP) for the specific task of predicting the ‘next word’.
That is, given a phrase or group of words, GPT-3 can predict the most likely next word to use. For example, it can be taught with input data such as:
“To be or not to be, that is the question, whether it is nobler to suffer the slings and arrows of outrageous fortune.”
GPT-3 will take the first n words, for example:
“To be or not to be, that is the question, whether it is”
And then be trained to suggested that the next word would be “nobler”.
GPT-3 has been shown to be very successful at this specific task. It has been trained on a huge set of texts (a text corpus) including texts from Wikipedia articles and a wide range of books etc.
But is this intelligence? Is this something that understands what it has learned?
In practice what it means is that GPT-3 actually learns is a conditional probability distribution on all possible next words. As such it does not understand the text input in terms of its meaning.
This raises the question does it need to understand the text, and the answer in many cases is no; it can successfully predict the next word for a given body of text is the vast majority of cases.
How Codex uses GPT-3
Codex is built on top of GPT-3 – so how does NLP prediction relate to code generation? Essentially the training data used for Codex relates to programs rather than literary texts. The training data sets included well documented (in terms of comments of in python docstrings) code. These comments relate the associated code to actions, operations, behaviour etc. As with the basic GPT-3 the larger the training set the better and Codex was trained on a large repository of open-source software (about 159 GB of code) using public GitHub repositories. In addition, a further set of specially crafted training programs that closely mirrored the evaluation tasks were also used.
Codex thus leans a conditional probability distribution on all possible code samples relative to their comments / documentation strings.
Issues with Codex
Codex is not without its drawbacks and limitations…
Codex does not understand code! It merely suggests previous code solutions associated with words or phrases. Of course, this can be a very useful approach and indeed many humans use exactly this approach themselves. For example, a web designer, who is a gifted artist but lacks any programming knowledge, can search the internet to find code samples that do what they need. However, they then just cut and paste the code into their web page without much understanding of what it does. This very often causes unexpected behaviour and at worst be dangerous to the owner or user of the web page. This requires someone with knowledge of programming to verify.
Suboptimal or plain wrong suggestions! Of course, the problem can come when the code solution found by this web designer is sub-optimal or limited in functionality or contains bugs – or simply does not work in the new context in which it is being used.
To some extent this is all true of the code suggested by Codex. It only finds existing examples on (sophisticated) probabilities. It thus can propose solutions that are not appropriate, do not meet the requires or are in some way sub optimal or limited. However, it might take an experienced programmer to identify or understand this.
You need to phrase the question correctly! In such situations it might be necessary to rephase a request to Codex to get the correct solution – and indeed it may take several attempts to get the desired solution. The implication of this is that the user needs to have a certain level of programming knowledge in the target language to successfully use Codex!
You still need to test the solution. In addition, any solution proposed by Codex should be verified to ensure it does what it is required to do, that it does not include any unexpected / undesirable behaviour or any bugs (relative to the user’s requirements). It is therefore necessary to write a suitable suite of tests for any Codex proposed solution. Again, this typically requires some programming knowledge and knowledge of an appropriate testing framework such as Jest or PyTest etc.
Malicious Code. This was mentioned earlier, but as Codex does not understand the code it suggests, any malicious code embedded in the training set could be suggested to an unsuspecting user. Again, to identify such issues the user of Codex needs to be a knowledgeable programmer.
“A solution” versus “The best solution” Finally, Codex only aims to propose a solution to a request not the best solution. Of course, here the term best could mean anything from most efficient, fastest, best use of memory, best structured to most stable, safest or most secure etc. It may therefore be that any Codex suggestion should be considered a starting point to any further code development rather than the final result. This again means that user of Codex needs to be a developer.
All of the above suggest that to make the most of Codex the user needs to be a knowledge developer / coder. In essence this is similar to the way in which a developer might use (and benefit from) a code template, a design pattern or other code suggestion; it is an aid to an experienced programmer …not a replacement.
Interestingly, the role of Codex within the CoPilot tool is as a very sophisticated code suggestion / completion tool. That is, it is intended as an aid to the developer to help speed up development times and take some of the drudgery out of the simpler aspects of coding and leave the more complex solution generation to the programmer. As such it is a tool targeting developers and coders, not one that aims to replace them.
As of the time of writing, human software developers can breathe a sigh of relief as the answer to our original question is “No, OpenAI Codex does not render human coders obsolete!”. You still need the knowledge, experience, and skill a developer / coder has to make the most of what Codex has to offer.
Original article by Framework Training