Easier High Throughput Computing
Bosco is designed to manage high throughput computing jobs where hundreds to thousands of jobs run simultaneously on remote clusters. Previously, it was the responsibility of the researcher to manage these jobs by hand to ensure satisfactory job completion. With Bosco, the effort is reduced since the researcher submits these jobs on their own workstation and Bosco manages all the remote aspects of running the jobs and collecting the data.
Submit Locally (from your workstation)
In Bosco, researchers submit jobs to computational clusters directly from their workstations thereby replacing the conventional method of remotely logging into a cluster to manage job submissions and data. A user drops Bosco onto their desktop and the jobs and data flow directly to/from their desktop. Bosco fulfills the researchers requirement to submit jobs locally but run them on a remote cluster.
One Submit Model for Different Cluster Types
Bosco’s enables researchers to write job submission scripts in one language (based on HTCondor) and then run these jobs on clusters that may have different interfaces such as LSF, PBS, or SGE. This is a huge improvement. There is no longer a need for the researcher to learn separate languages for each job manager.
Bosco automatically takes care of the error prone, and repetitive work that researchers have previously had to cope with in managing their jobs and workflows by hand. All Bosco needs is a user account and password on the remote cluster and it does the rest. For example it automatically identifies the maximum number of jobs that can be submitted to each cluster, and throttles the jobs accordingly.
Bosco Power User benefits
If you are a power user that is exceeding the resource you are currently running on, there are some advanced methods of utilizing Bosco that can facilitate bringing additional resources to your fingertips.
Simultaneous job submission to multiple clusters
Bosco enables researchers to distribute jobs using a single script to multiple clusters, and let Bosco decide the best one to run your job fastest.
On a typical campus, the each department may have its’ own cluster (e.g. Physics may have a cluster, Computer Science has one, and Chemistry may have another.) Or a computing center may have multiple clusters reflecting multiple generations of hardware. In either of these cases, researchers are required to pick which cluster to submit jobs to, rather than submitting to which ever has the most free cores. Additionally, these clusters may have completely different job schedulers that require the researcher to learn multiple submission scripting techniques.
BOSCO makes the clusters appear uniform by overlaying each cluster with a job submission translator. In this manner the researcher only learns one submission methodology (based on HTCondor). Jobs submitted via BOSCO will then be run at whichever cluster has the first free cores for you to run on. Bosco does the cluster level load balancing for you.
To best take advantage of this method, researchers should run Bosco on workstation class systems with steady networks. Certain ports must be open on these networks as well.
Access to National Cyber-Infrastructure Resources
Bosco supports access to large pools of resources available in the National Cyber-infrastructure such as the Open Science Grid (OSG) for academic researchers. To accomplish this, Bosco researchers can request accounts on OSG “gateway” systems. Jobs that are sent to one of these “gateway” systems are transparently resubmitted to the OSG.
Multi user capabilities
Teams of researchers can submit jobs from the same host. In this case, Bosco is installed on a server by a sysadmin sysadmin who is responsible for adding multiple clusters.
How does Bosco Work?
BOSCO is a workflow management system built on top of HTCondor intended for researchers. It allows researchers to connect multiple clusters to create a on-demand processing resource from clusters they have SSH access.
- No Special Cluster Configuration
The remote clusters do not require additional software, or open ports. Only the ability to connect to the submit node.
- Single Processing Environment
The users will use only HTCondor’s submission, even if the remote clusters are PBS, LSF, SGE.
- Submission through SSH
No need for a gatekeeper software, BOSCO submits jobs over SSH to the remote clusters. If you have a login to the cluster, then BOSCO can manage submissions.
- Multiple Remote Clusters
Submission to multiple clusters is automatic. Processing will be load balanced between the clusters to maximize throughput. The remote clusters can run PBS, LSF, SGE, or HTCondor, but will all appear the same to the researcher.
- Secure Execution
Jobs and data are transferred using strongly secured connections between the worker nodes and the submit node.
|Simple Job Submission||✔||✔||✔|
|Submit to remote clusters||✔||✔||✔|
|Transparent submission to multiple clusters||✔||✔|
|Automatic file transfer||✔||✔|
|Cluster Load Balancing||✔||✔|
|Single Processing Environment||✔||✔|
|Submission to other Resource Managers||✔||✔|
|No Special Cluster Configuration||✔|
|Submission through SSH||✔|
BOSCO submits to remote PBS (or LSF, HTCondor, and SGE) clusters. The job submission is handled through a SSH connection that BOSCO initiates with the remote cluster. After the job is submitted to the remote cluster, it is monitored by BOSCO for state changes, such as when the job begins to run.
The local cluster scheduler eventually starts the job on a worker node in the cluster. The job is a BOSCO Glidein. A Glidein is a HTCondor worker node that creates an on-demand overlay cluster on top of other clusters. By creating the overlay, the users only need to interact with HTCondor, whether it’s submit scripts or file transfers, it’s all HTCondor.
In order to use BOSCO, you will need following resources.
This is the system that the researcher uses to submit jobs. In general it can be the user’s laptop, workstation, or it can be another system that the user logs into for submitting jobs to the cluster.
Cluster Login Node
This is the node that you normally login to on the PBS, LSF or HTCondor cluster. We support:
- PBS: Torque and PBSPro
- HTCondor 7.6+
This is the remote cluster that jobs will execute on. The Cluster login node node belongs to this cluster. The cluster needs:
- Shared home filesystem
- Outgoing network connectivity (reach the internet from the worker nodes).