We live in a golden age where data can be transmitted at light speed to almost anywhere on the planet. Yet modern scientific publishing, and indeed much of the research process, languishes in a time where printing presses rule. It is long past time that these processes and publication practices were modernized - there is no need to primarily rely on PDF tables and static images in an age where advanced JavaScript applications can run on cell phone browsers using hardware accelerated rendering in 2D and 3D and where augmented and virtual reality is emerging as a powerful, immersive alternative to more traditional approaches.
It is time for a paradigm shift, that embraces fully digital workflows where all steps of research projects are documented, one in which modern tools can be used to "show all steps of our research process". To that end the authors of this paper have been developing an ambitious open source platform to realize the dream of open science in a fully connected digital age. PDFs and flat images are relics of the past, attempts to map the limitations of the printing press to the digital world. It is time to push the scientific research enterprise forward, and drive innovation in software to a point where interactive data outputs can be published for the whole world to see. We must look at the technologies available in modern web browsers to offer seamless publication of data, code, and written results by leveraging the best open source software. Extending this software where necessary to offer specialized views, entry points, and direct access to raw, partly processed data, to tools that extract the key information from the data, and even notes, linked from figures in published works.
The modern scientific workflow takes place increasingly on a computer, or distributed set of computers/nodes/images. This needs to be reflected in, and embraced throughout the dissemination of research. Computational chemistry by its very nature is a prime example of this, and we have the tools to document all steps in producing, processing/analyzing data, visualizing data and linking that with the final publication. More and more research is multidisciplinary in nature, with experimentalists, theoreticians and computational scientists working closely together in their pursuit of scientific discoveries that advance society. Integration of diverse data resources and analysis tools into complex research workflows can be made much more accessible with modern technologies. In this paper we will discuss our approach to the handling of data and metadata, storing them on a flexible data server platform, representing rich data that can be exported, integrating computational chemistry data generators (quantum chemistry and machine learning codes) with experimental data, and approaches to generate and access the data through the web using browser-based and desktop-based tools.
The project described has been developed in the open on GitHub using OSI-approved open source licenses \cite{initiative}, primarily the 3-clause BSD license \cite{initiativea}. Issues were used to track new features to be developed, and bugs in existing functionality. Pull requests offer a number of commits in a logical branch to add new features of fix bugs, where possible these feature automated testing and line-by-line review of code additions/removals using the web interface. Focused software repositories were used for each component, such as the server code in Python \cite{pythonorg}, the web client code in JavaScript/TypeScript, and the deployment code/image build instructions. A coordinating repository was also used to discuss high level development of the platform within the "OpenChemistry" \cite{chemistry} organization on GitHub.
The GitHub platform offers access control for registered users, including individual and team level access control. Integrations with a number of other platforms create a rich ecosystem where continuous integration jobs can be run for pull requests, when the main development branches are merged, and when release tags are created. Commits and release tags can be cryptographically signed, and releases offer snapshots of the software at a particular point in time. Pull requests can be made by anyone, code can be reviewed by anyone, but only approved developers can merge code to the main development branches. This offers a level of transparency and accessibility that is often the norm in open source software development, but is not so common when developing code for scientific research. All code developed, along with all previous versions of the code, remain available using these standard technologies and platforms. More recently this has been used to document the data/metadata formats, the build process for images, and the deployment of services.