Revision as of 18:08, 5 October 2009

Overview of EULER Cluster

euler.lbl.gov is a computing cluster consisting of one frontend node and ten dedicated compute nodes. The cluster runs Rocks Cluster 4.2.1, which is a collection of cluster management tools built on top of CentOS 4.4 ( a standard Redhat Enterprise rebuild, like Scientific Linux). Job scheduling is through Sun Grid Engine (SGE) 6.0.

The first time you login to the system you will be asked to generate an SSH key. The process will be started automatically. You will be prompted for a passphrase. This key will only be used to migrate jobs from the frontend to the various compute nodes. It is recommended that you enter an empty passphrase for this key. Otherwise you will have to have an active ssh-agent process running all the time or your batch jobs will not be able to be started. There are no security implications of making the passphrase for this particular key empty (as long as you don't distribute the private key residing in ~/.ssh, of course).

There is a webview of the cluster status at http://euler.lbl.gov. Access to this page is restricted to the subnets that the regular theory computers reside on.

Hardware

The frontend has a 2.4GHz Intel Core 2 Duo processor (two cores) with 4GB of RAM. It has a 250GB disk, plus a 750GB RAID array.
Seven compute nodes have a 2.4GHz Intel Core 2 Duo processor (two cores) with 4GB of RAM and a 250GB local disk.
Three compute nodes have two 2.5GHz quad-core Xeon processors (16 cores) with 16GB of RAM and a 750GB local disk.
Only the frontend is directly connected to the outside network. The compute nodes are interconnected with a dedicated Gigabit ethernet network.

Storage Configuration

/home -- The user home directories reside on the frontend's 250GB disk and are NFS exported to all the compute nodes.
/home/data --The frontend's 750GB disk array is NFS exported to the compute nodes at /home/data. Users can create subdirectories under /home/data as on a standard scratch disk and the contents will be accessible from any compute node.
/home/data1 --A large (5.4TB) network mounted disk with a 4Gbit/sec connection to the cluster. The disk is configured as a standard scratch disk, just like /home/data.
/state/partition1 -- Each compute node has a local working area mounted at /state/partition1. These areas are local to each node and are not visible to the other nodes. If your job is disk intensive, you may wish to have your jobscript copy files onto the local work space at the beginning of the job. If your job writes files into the local work space, be sure to have your jobscript move them out when it is done. This is for two reasons. First, you will not necessarily know what node your job ran on, so it may be difficult to find the files later. Second, it leaves the node clean for the next user.

Running Jobs -- Job Control

The policy regarding usage of the cluster is currently evolving. We will find out what works best and implement it based on user needs and preferences. For the moment here is a rough outline. Management has attempted to accomodate the user comments provided so far.

Interactive Jobs

You may run interactive jobs on the frontend. As long as the job is not too CPU intensive it should not disrupt the cluster. Tasks such as editing, building and debugging code are most simply accomplished simply by logging into euler.lbl.gov and doing your thing.

If you have a cpu intensive interactive job, you can use the command qlogin. This will select a free compute node and log you into it. This is a good procedure for things like root or for interactive jobs that require occassional user input.

At the moment one compute node (compute-0-0) is reserved for interactive jobs. This node has two slots, so two qlogin sessions can be open on it at one time. If there seems to be a need for more slots, let Jeff Anderson know and more will be reserved. Remember that there is no limit to the number of slots open on the frontend, as long as your interactive session is not too demanding for too long.

If you must subvert the batch system selection provided by qlogin, it is possible to ssh directly into the compute nodes. They are named compute-0-0 through compute-0-6 and are accessible only through the frontend (euler.lbl.gov). This is not recommended unless you have a good reason.

Batch Jobs

After reading this section, please visit the new examples page

Batch jobs are submitted to the cluster via the command qsub
There are two batch queues available: The default (don't specify one) and day.
If you don't specify a queue, your job will start on any available compute node. There are a total of 14 slots.

day is for jobs that will run during the working day (entirely or in part). It specifically excludes the compute nodes that are supposed to be reserved for interactive slots during the day. This leaves it with 12 slots.

Some examples:

qsub myscript.sh will start the job myscript.sh on any free compute node.
qsub -q day myscript.sh will start the job on some free node in compute-0-1 through compute-0-6, politely leaving compute-0-0 available in case someone wants to run an interactive job on it.

The command passed to qsub in this form must be a script (not a binary). If you must pass a binary directly to qsub, it is possible. Use the syntax qsub -q QUEUENAME -b y myprogram You can see the status of the batch queues by qstat -f. You can suppress the listing of empty queues by qstat -f -ne.

There is also a graphical interface to the batch system. Run qmon on the frontend to start it. It may or may not be of interest.

Management is in the process of defining different batch queues to allow finer grained control (days versus nights versus weekend ... high priority vs low priority). This is a work in progress. User feedback is welcome.

Important note: The batch system does NOT automatically "parallelize" your job scripts. One qsub invocation will start your job on a single processor. To take advantage of the multiple processors available in the cluster, you must break your job up into pieces and submit each piece via qsub.

The exception to this is if your code is actually written for a parallel environment, specifically using MPI commands. In that case, you can start it on multiple processors by the mpirun command. See the documentation to find out how to do this.

Software Tools

All editing and compiling tools should be available on the frontend. If you need something that is not installed, inform Jeff Anderson.

The processors in the cluster are 64bit, as is the linux distribution running on them. To take full advantage of this you should probably compile your applications directly on the cluster. Applications built on the regular linux workstations will be 32bit, and should run on the cluster, but performance will likely be better if you rebuild on the cluster, if at all possible.

Various HEP software is installed on all nodes at /share/apps. All the versions listed here are 64bit. This could cause problems if you have dynamically linked 32bit applications imported from elsewhere. The solution is usually to rebuild your application on the cluster.

Here is a list of what is installed so far: CERNLIB: CERNLIB 2005 is installed at /share/apps/cern/pro ROOT: ROOT 5.14 is installed at /share/apps/root CLHEP: CLHEP 1.8, CLHEP 1.9 and CLHEP 2.0 are installed under /share/apps. /share/apps/cern/CLHEP is a link to CLHEP 1.8. Note that after 1.8, the HepPDT and HepMC packages were removed from the official CLHEP distribution. If you need those packages, they can be found under /share/apps. gsl: gsl-1.9 is installed in /share/apps/gsl-1.9. MadGraph/MadEvent: MadGraph/MadEvent 4.1.31 (including pythia-pgs) is installed under /share/apps/MG_ME_V4.1.31. SHERPA: SHERPA 1.0.10 is installed under /share/apps/SHERPA-MC-1.0.10, but to use it it appears that you need to install to a personal directory. To do this untar /share/apps/Sherpa-1.0.10.tar.gz to your home directory (or to a directory in /home/data). Then enter your new Sherpa directory and run TOOLS/makeinstall -c The build process has been verified, so if it does not work for you, please let management know. RunMC: RunMC 4.0 is installed under /share/apps/RunMCv4.0. Herwig: The Herwig6510 source and include files are in /share/apps/herwig65 Pythia: Pythia 6.3.25 and 6.4.11 are installed in /share/apps/pythia. Also libpythia6205 is part of CERNLIB. jimmy: jimmy-4.3.1 is installed in /share/apps/jimmy-4.31 pandora: pandora 2.3 is installed in /share/apps/pandoraV2.3 pandora_pythia pandora_pythia 3.3 is installed in /share/apps/pandora_pythia_V3.3

MC@NLO: MC@NLO V3.3 is available in /share/apps/MC@NLO

In Progress

Please visit this page regularly to find out the latest status. All requested software has now been installed. User feedback is welcome. The initial batch structure has been setup according to user comment. Further user comment about how it is working is welcome.

@@ Line 7: / Line 7: @@
 <h2>Hardware</h2>
- The frontend has a 2.4GHz Intel Core 2 Duo processor (two cores) with 4GB of RAM. It has a 250GB disk, plus a 750GB RAID array.
+*The frontend has a 2.4GHz Intel Core 2 Duo processor (two cores) with 4GB of RAM. It has a 250GB disk, plus a 750GB RAID array.
- Seven compute nodes have a 2.4GHz Intel Core 2 Duo processor (two cores) with 4GB of RAM and a 250GB local disk.
+*Seven compute nodes have a 2.4GHz Intel Core 2 Duo processor (two cores) with 4GB of RAM and a 250GB local disk.
- Three compute nodes have two 2.5GHz quad-core Xeon processors (16 cores) with 16GB of RAM and a 750GB local disk.
+*Three compute nodes have two 2.5GHz quad-core Xeon processors (16 cores) with 16GB of RAM and a 750GB local disk.
- Only the frontend is directly connected to the outside network. The compute nodes are interconnected with a dedicated Gigabit ethernet network.
+*Only the frontend is directly connected to the outside network. The compute nodes are interconnected with a dedicated Gigabit ethernet network.
 <h2>Storage Configuration</h2>
- /home -- The user home directories reside on the frontend's 250GB disk and are NFS exported to all the compute nodes.
+* /home -- The user home directories reside on the frontend's 250GB disk and are NFS exported to all the compute nodes.
- /home/data --The frontend's 750GB disk array is NFS exported to the compute nodes at /home/data. Users can create subdirectories under /home/data as on a standard scratch disk and the contents will be accessible from any compute node.
+* /home/data --The frontend's 750GB disk array is NFS exported to the compute nodes at /home/data. Users can create subdirectories under /home/data as on a standard scratch disk and the contents will be accessible from any compute node.
- /home/data1 --A large (5.4TB) network mounted disk with a 4Gbit/sec connection to the cluster.  The disk is configured as a standard scratch disk, just like /home/data.
+* /home/data1 --A large (5.4TB) network mounted disk with a 4Gbit/sec connection to the cluster.  The disk is configured as a standard scratch disk, just like /home/data.
- /state/partition1 -- Each compute node has a local working area mounted at /state/partition1. These areas are local to each node and are not visible to the other nodes. If your job is disk intensive, you may wish to have your jobscript copy files onto the local work space at the beginning of the job. If your job writes files into the local work space, be sure to have your jobscript move them out when it is done. This is for two reasons. First, you will not necessarily know what node your job ran on, so it may be difficult to find the files later. Second, it leaves the node clean for the next user.
+* /state/partition1 -- Each compute node has a local working area mounted at /state/partition1. These areas are local to each node and are not visible to the other nodes. If your job is disk intensive, you may wish to have your jobscript copy files onto the local work space at the beginning of the job. If your job writes files into the local work space, be sure to have your jobscript move them out when it is done. This is for two reasons. First, you will not necessarily know what node your job ran on, so it may be difficult to find the files later. Second, it leaves the node clean for the next user.
 <h2>Running Jobs -- Job Control</h2>

Difference between revisions of "Euler"

Revision as of 18:08, 5 October 2009

Contents

Overview of EULER Cluster

Hardware

Storage Configuration

Running Jobs -- Job Control

Interactive Jobs

Batch Jobs

Software Tools

In Progress

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools