Skip to Main Content

Generative AI CU Boulder

Using Large Language Models & Artificial Intelligences for Code

A screenshot of ChatGPT providing examples of “Hello World” programs in five different programming languages: Python, Java, C, JavaScript, & Ruby

Many LLM/AI services will generate computer code and some, such as GitHub Copilot, are designed specifically for code generation. 

While these services can be valuable for writing code, concerns related to copyright and licenses have led to lawsuits from programmers, countries creating laws specifically regarding these services, and companies forbidding the use of code generation services.

Before you use these services for your own code, there are several things you should consider.

Will the code work?
There is no guarantee that the code generated from these services will work! While frequently called “AI” there is no “intelligence” behind them and they may generate incorrect, poorly optimized, or insecure code. Make sure to test any code they generate for you.

Where did the code come from?
Open Source code is generally released under licenses that state how the code can be used. Those contributing code to existing projects or expanding upon existing code must follow these license requirements. The services generating code have been “trained” on datasets of code that is publicly available online and released under many different licenses (though some, like Tabnine, are trained only on code with permissive licenses).

While stolen (or uncredited) code has been an issue for years, code generation is expanding this issue as LLMs have generated identical (or “nearly verbatim”) code without crediting the authors or following the terms of the licenses
 
Some licenses now have specific guidelines for the use of AI-generated code. For example, the Apache Software Foundation has released a Generative Tooling Guidance document that provides information about how and when code generated by AI can be used in Apache-2.0 licensed projects. Other open-source developers have gone so far as to ask not to have derivative code uploaded to GitHub due to the way GitHub Copilot works. 

What are you using the code for?

  • Personal use: If you’re working on something for yourself and don’t intend to ever share it or make it public, there shouldn’t be any issues.
  • Assignments: If you’re writing code for an assignment make sure you know what your instructor’s guidelines are. Some instructors have rules against using generative AI for assignments and will consider its use plagiarism or academic misconduct
  • Research projects: If you’re planning on publishing something that uses your code you may need to freely and openly release your code and this could cause problems if you do not know if the AI-generated code you're using meets the original license terms. Make sure you read the license terms before you use someone else's code!