Planning a Computational Project

Careful planning can increase the sustainability of the code developed for a computational project substantially. From our experiences, we developed a set of questions intended to aid in the planning of new projects. This is not an exhaustive list but meant as a starting point for discussion

Where is the code stored?

Ideally any code developed for a computational project should be stored in a version control system, preferably in the cloud. This could be a service like GitHub or GitLab or an institutional repository. The main goal is to have code versioned, backed up, and available even after a lab member has left.

Who owns a piece of code?

This should be decided before any code is developed to avoid any confusion later. Who can use a piece of code and what are the requirements to use it? Does the lab member who wrote the code own it and take it with them after they leave, or does it belong to the lab?

How is the code licensed?

Is any code developed by a member of the lab open or closed source? What license does apply? Can the code be modified without the consent of the original developer?

How is the code published?

Is it published using a service like Zenodo? Is the expectation that a paper is written about the code (not about the research project for which the code was written)?

Who gets credit for the code?

If someone else than the original developer uses a piece of code and publishes results based on it, how is the original developer of the code credited? Does this change if the maintainer of the code changes?

Who maintains the code?

Is the code maintained only by the original developer or is it handed off to someone else (in the lab or organization)?

What is the workflow to add new features?

How are new features added to the code? Can anyone in the lab add new features? If so, how is it organized? If not, who can decide what features should be added?

How big are the datasets a project uses?

Is the data to be analysed rather small (up to a few gigabytes) or is significant storage required (hundreds of gigabytes to terabytes)? This might change during the lifetime of a project but there should be some initial expectations. A couple of gigabytes are a lot easier to manage than several terabytes regarding searchability, backing up, or versioning.

Where will datasets be stored?

Will datasets be stored on personal computers and hard drives or can it be stored in a repository to make it available to multiple members of a lab? What does the copyright allow?