Programming for Physics and Astronomy

From AstroEdWiki
Revision as of 21:28, 6 February 2013 by WikiSysop (talk | contribs)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Only 50 years ago, most physics and astronomy research relied on the analytical skills of the scientist, on the tools of classical mathematics that were taught to them as students, and in some cases on data management and numerical analysis done by hand. Today, cutting edge research often requires high speed computing for simulation and data analysis, interactive tools to enhance extraction of relevant information from multi-parameter databases, access to automated and robotic instrumentation, and management of incomprehensibly large data sets. The issue for the researcher in training is not whether computing skills are needed, but which ones are most critical.

Broadly classed, there are several options:

  • Packaged commercial, proprietary, licensed programs and tools (e.g. Excel, Maxim ...)
  • Licensed proprietary programming environments (e.g. IDL, Matlab, Mathematica ...)
  • Open source tools (e.g. GDL, ds9, Grace, Sage ...)
  • Programming languages (e.g. C, C++, Fortran, Java, Python ...)
  • Web resources (e.g. HTML, Javascript, PHP, Perl ...)

In order to decide which of these apply to your own research, consider a larger question of what role computer science plays in contemporary physics and astronomy, and in what direction your research field is headed. Then, pick the tools that solve the problem at hand, realizing that the skills you develop at each step raise you up to reach a solution for the next, unknown, problem. In some cases, continued reliance on an old, inefficient, but proven, method only delays the need to acquire new skills and knowledge.

An interesting perspective on the significance of large data base sciences was offered by Chris Mattmann in a Nature Commentary, in which he pointed out that the Square Kilometer Array (SKA), scheduled to have first light in 2020, will generate 22,000,000,000 terabytes (TB) of data per year! In the optical regime, the Large Synoptic Survey Telescope (LSST) has a 3.2 giga-pixel (3200 mega-pixels) camera taking images in 15 second exposures throughout the night. The resulting images will offer a nightly record of nearly the entire sky in an open, publically accessible database reaching 24th magnitude in single exposures, and 27th magnitude in stacked images of fields of 10 square degrees. There will be multi-dimensional data products from the LSST that will require exceptional unique tools to use effectively.


Programming languages for physics and astronomy applications

Current physics and astronomy research relies on several languages and computing environments, and there is no single choice that is optimal for every problem. Typically, we would consider first what prior work has been done that can be used, what programming skills are required to add to the prior work, or to develop new applications, and the support that's available for the individual researcher when, inevitably, they need help. Here are a few common ones.


Fortran

Fortran (from "Formula Translating") made its first appearance in research use with the availability of IBM mainframe computers on university campuses and research centers in the 1960's. It is still a popular programming language, especially for high performance computing. Its original version was constrained by the use of punched cards for input and output, and vestiges of that remain in the system today. There are commercial optimized compilers available for most computing systems, and the effective and well-maintained GNU open-source compiler (gfortran) is available for Linux, Windows, and MacOS.

Fortran's handling of text input and output is awkward, and there is no standard graphical user interface. It's strong point today is in massively parallel computing.

C

The C programming language was developed at AT&T Bell Labs around 1970, and has become the most widely used programming language today. Derivatives, like C++, Java and even Perl and Python share features of its structure and syntax. The language is highly standardized, easily commented, and consequently readable if carefully annotated. Because it underpins most graphical user interfaces, there are libraries to utilize Motif, GTK, and Qt in C programs. Free open-source C compilers are available for Windows, Linux, and MacOS. The the Linux world, the standard compiler is the GNU compiler collection or GCC, "gcc" on the command line. It is included in every base Linux installation.

C is an excellent programming language for almost any application, and there are routines available in the public domain for many applications in physics and astronomy computing. Compiled C programs are readily optimized and excecute with speeds that take advantage of the most recent hardware in desktop and large multi-core computing environments. For example Nvidia's graphical processing unit (GPU) computing is fully supported, enabling thousands of separate processors or "cores" to be tasked to solve large problems.

The drawbacks to C are that development and debugging can be tedious in a write-compile-test-rewrite process, and that adding a GUI to an application is painstaking even in an integrated development environment (IDE). If an IDE such as Eclipse or Netbeansis used, then the resulting code cannot be easily read and debugged outside of the IDE, so the inherent advantages of a simple text file for each routine and readable code is often lost. The compiled code must be run in the environment in which it was written, so that C programs have to be compiled, and often debugged, for each target operating system.

Java

The Java programming lanuage was developed by Sun Microsystems in the 1990's and made openly available under the GNU Public License (GPL) in 2007. When Sun merged into Oracle, a public open source version of Java known as Open Java Development Kit or OpenJDK became the community standard for collaborative development.

The Java environment is designed to allow programs to be written once and run anywhere, though some say "write once, debug everywhere" is an apt description. Javascript is a version of Java that runs in a browser and is almost essential for web applications. Java and Javascript are consequently one of the most widely used programming methods.

However, to date Java has not been widely used in astronomy so that when it is employed, the programmer has to create tools to handle most key astronomical functions. Conveniently, Java and C are similar, and translation of the common C code to Java is usually straightforward. Two new applications, AstroCC and AstroImageJ, from Karen Collins at the University of Louisville offer professional verified code to handle fundamental astronomy, image processing, and photometry.

IDL and GDL

The Interactive Data Language or IDL is popular in astronomy as medical imaging. It is a proprietary system (originally developed for astronomy) that can be very expensive to license except as a student. However, it offers a variety of well-tested routines that have been contributed by the original developers and users.

The GNU Data Language (GDL) is a free implementation of the same programming command set that, for the most part, is equilvalent to IDL. The IDL Astronomy User's Library at NASA works in both IDL and GDL. GDL has a useful interface to Python, so it can be utilized together with other comprehensive programming tools.

The primary disadvantage to IDL has been its cost, which makes it difficult for users not associated with well-supported research institutions to utilize. As GDL continues to improve, its IDL-like environment may see wider use in Physics and Astronomy.

Matlab, Mathematica, and Sage

Two very well established tools for computer-based algebra and analysis are Mathematica and Matlab. Both are proprietary, and costly. Mathemetica is widely used in mathematics and to a lesser degree in physics, biophysics, chemistry, and engineering. Matlab is seen in engineering and to a lesser degree in physics. Matlab is modular, so that a user would purchase only the parts of the system needed.

Neither Matlab nor Mathematica have wide use in astronomy, and the available libraries are limited. There is an open source Sage is a promising alternative that is available for free.

Mathematica offers very good support for symbolic algebra and calculus, and for interactive multi-dimensional graphics. Matlab interfaces with some instrumentation, and with the LabVIEW programming environment from National Instruments. Experience with Matlab and LabVIEW is very worthwhile for careers in engineering and related commercial disciplines.

The drawback to Mathematica or Matlab is their proprietary nature, which greatly restricts the distribution and reuse of code. For this reason, for new work in astronomy and physics other sytems are preferrable if they will meet the need.

IRAF