Designing and Implementing Processing Pipelines with Conductor: The HiROC Experience

Monday, September 17, 2007
Castalia (LPL, U of AZ)

The use of the Conductor pipeline procedures management system will be described in the context of experience at the HiRISE (High Resolution Imaging Science Experiment; Operations Center (HiROC). The applicability of this system to other data processing production operations will be the primary focus of this description.

Conductor pipelines have been used at HiROC for processing observation data generated by the HiRISE instrument on board the Mars Reconnaissance Orbiter ( spacecraft starting with raw data downlink over the internet from the Jet Propulsion Laboratory through the production of EDR and RDR PDS image data products delivered to the public. This includes all science data processing of the image data as well as updating of the database tables providing PDS Data Node information sources. The HiROC Conductor pipelines form an extensive multi-pathway system of chained segments operating in parallel on a cluster of processing nodes. HiRISE is expected to generate more data than all previous Mars missions combined and the HiRISE Project is attempting to minimize delays in releasing data products to the science community and the general public. So processing - and reprocessing as needed - all of this data through all of the required procedures must happen quickly and reliably.

The HiRISE Project has been getting excellent results from its observation data processing pipelines. This talk will review the capabilities of Conductor that have been exploited at HiROC to achieve this success. Particular attention will be given to commonly occurring pipeline processing design requirements and the implementation and management issues that we encountered. Our experience demonstrates how very large amounts of data, arriving in a continual and asynchronous flow from a remote source, can be put through rigorous science data processing and metadata accumulation with an automated mechanism that can run 24x7, usually unattended, with high reliability and modest cost. The Conductor mechanism is capable of handling very complex processing networks and yet is suitable for quite simple tasks. It can be scaled up to consume very large data volumes and yet is quite effective for low capacity needs. Anyone anticipating having responsibility for managing, at any level, a research science data processing and product generation operation would benefit from learning about the HiRISE experience.

For details about Conductor, or to obtain the software, visit the Planetary Image Research Laboratory web site ( at the University of Arizona.

