Compute cluster with hot/cold storage
High performance compute cluster for the Genexpress botanical genetics laboratory
The request
Genexpress is a research laboratory specializing in botanical genetics and is part of a network of laboratories distributed across the globe.
The Florence laboratory’s request was very forward-looking for the time (2006): create a high-performance compute cluster usable not only by local researchers but any other laboratory in the network, with a web-based preview of simulations and the option to intervene in real-time.
Along with general compute resources, researchers anywhere in the network needed to be able to use the distributed storage to share results with each other and for publication. Historical data needed to be backed up to hot/cold storage, for immediate access and long-term archival.
Challenges faced
- High maintenance costs for desktop computers
- Differing credentials on various systems
- Labor-intensive software update procedures
Solutions implemented
- Costs concentrated on hardware cluster
- Single Sign-On implemented via centralized LDAP server
- Applications provided by cluster as SaaS
The requirements
- Give researchers access to a distributed, network accessible data storage system
- Build a Linux cluster freeing researchers’ desktop from intensive compute operations
Develer’s contribution
- Selected and tested enterprise server
- Configuration and testing of software
- Web application for managing work queues
Advantages of our solution
- Eased collaboration between laboratories worldwide (data could be shared directly on the cluster)
- Cost savings realized on the hardware front: instead of updating desktop machines for the entire research staff, upgrades were done to the three servers comprising the cluster
- Unification of workflow, manageable all the way from the wet lab to the dry lab
Looking for Linux experts?
Advantages of our approach
Working was done in continuous contact with the client, from the first phase of development to final implementation. As expected in highly experimental projects, it was necessary to change the details of the implementation several times to overcome challenges faced. Our staff, thanks to our familiarity with open source projects related to the web, contributed patches to the Apache web server to optimize performance on the 64-bit processors used in the cluster.
Advantages of open source
The goals of this project would’nt have been achievable without open source tools. The entire software stack was open source, from the web server to the final applications. In a highly experimental project such as this, it was necessary to avoid reinventing the wheel and lean as much as possible on existing solutions backed by the knowledge and experience of the entire community of open source programmers.
Client
National Research Council (CNR)
Staff
“An atypical but fascinating project: to create, from scratch, a Linux compute cluster with all the expected features of a modern datacenter (UPS, redundant off-site backups, etc) and a web-based monitoring and management interface. The result was ahead of its time considering the scarcity of off-the-shelf solutions at the time (2006).”