Commands and Stream Parsers

Commands are used by various parts of the control system in order to communicate with and control aspects of the target system resource manager. Commands are system calls, either to a local or remote OS, depending on the connection defined for the target system. A command is always executed through a Java API (the "process builder") which typically results in a bash -c command (possibly via an ssh connection.) The first argument of the command is the name/path of the executable. Arguments are then specified by an arbitrary number of arg-type elements.

The following commands are supported by the control-data-type:

Element Description
start-up-command

A command that is run to initialize the configuration. This command is usually used to check for correct versions and to obtain dynamic configuration information (e.g. the list of available queues) from the target system

submit-interactive

Command to submit a purely interactive job to the target system. An interactive job is defined as one that the user would normally run from a login shell.

submit-interactive-debug

Debug version of the submit-interactive command.

submit-batch

Command to submit a batch job to the target system. This type of job submission is normally asynchronous, i.e. the user submits a job and at some later point the job will be run.

submit-batch-debug

Debug version of the submit-batch command.

get-job-status

A user-initiated (on-demand) request to refresh the status information for a submission. Normal (polled) updates, on the other hand, are the responsibility of the monitor-type component. The status command nevertheless needs to be implemented in most cases, as it will be called internally just after submission.

terminate-job

A command to remove a job from the target system (terminating a running job if necessary). Note: if the submission type is interactive, the terminate-job command usually does not need to be implemented, as the process termination will be handled internally. However, in some cases (such as PBS -I) which require the interactive job to run as a pseudo-terminal, one may need this command in order to force its termination externally.

suspend-job

An optional command to suspend a running job.

resume-job

An optional command to resume a suspended job.

hold-job

An optional command to place a job on hold.

release-job

An optional command to release a held job.

shut-down-command

A command that is run to clean up after a job has been launched.

button-action

An arbitrary command that can be associated with a button exposed through the launch configuration Resources tab (see further below).

Note: A configuration may only define either a batch or an interactive launch mode, although each launch mode may have only two submission modes, run and debug. (Future versions may allow batch and interactive to coexist in a single configuration.)

Command Type

A command-type element is used to define a command.

CommandType

The following properties are available for a command-type element:

Property Description Default
name Specifies a name for the command. N/A
directory Specifies where the command will be executed. This defaults to the "home" or working directory of the remote control connection (the control.working.dir attribute). control.working.dir
redirectStderr Specifies that both output and error streams are sent back on stdout. false
streamBufferLimit Specifies the buffer limit for stream readers. -1 (use system defined)
replaceEnvironment Specifies that the environment set on the command should entirely replace the shell environment false (append the command environment)
waitForId Specifies that the output stream for the command is being parsed for an id which will appear as an attribute in the environment during the command execution, and that the execution should not return until it sees this id. Most submit commands will have these semantics. false
ignoreExitStatus Prevents an error being thrown in the case of non-zero exit of the command. This is usually used in the case of a command that incorrectly returns a non-zero exit status. false
keepOpen Specifies that the command should be held open for potentially repeated redirection of input, such as when sending commands to an interactive partition that has been allocated by a batch scheduler. There can only be one such command open at a time. false
flags Specifies additional flags that will be passed to the remote connection. The flags property is an OR'd string of three possible values:
  • NONE
  • ALLOCATE_PTY - allocates a pseudo-terminal
  • FORWARD_X11 - enables X11 forwarding on the connection
NONE

Commands that set the waitForId property to true are treated specially. These are job submission commands which produce a job id to replace the internally generated uuid, and are responsible for setting the status property of the @jobId attribute.


COMMAND TYPE TOKENIZER STATES
batch SUBMITTED
interactive RUNNING
interactive, keepOpen SUBMITTED, RUNNING

This table shows the various states that must be set depending on whether the command is interactive or batch. Commands that wait for an id must be provided with a stream tokenizer which recognizes and sets the @jobId state. Batch jobs will usually have a tokenizer which recognizes and sets the state to SUBMITTED when the job is submitted. The monitoring system will normally handle setting the job state to RUNNING (although this may also be done by the get-job-status command.) Interactive jobs, which just run the command as soon as possible, can set the job status directly to RUNNING. In the case of interactive jobs that set keepOpen=true (e.g., qsub -I for PBS, which also requires a pseudo-terminal to connect remotely), there will usually be a pause before the terminal job is actually scheduled. In order to allow the user to see that the job has been accepted and is pending, the tokenizer needs to set both SUBMITTED and RUNNING states, the latter when the job has actually started.

Command arg elements, the input element, and the environment element all make use of the arg-type type for specifying arguments. For the name-value-pair type comprising the latter one can as simple alternative set the value attribute to a string (which will be resolved first in the current environment); finer-grained control over the resolution of the value, however, requires the use of the arg type. When there is input present, it is directed to the input stream of the command. If the keepOpen attribute is true, a check will be made to see if there already exists an open process (which is also alive), which will then be used; otherwise, the arguments are executed, then the input arguments are given to the process. With an open command/process, the input arguments can be fed repeatedly to the same process; this allows, for instance, for continuous testing of an interactive job in the same interactive session.

Execution Environment

The environment element allows attribute values to be passed to the command's environment prior to execution.

EnvironmentType

In the simplest form, the name and value properties are used to specify the name of the environment variable, and a corresponding value that can be resolved from the attribute map. Finer control of the environment variable can be obtained by using the arg-type type.

The preserve property can be used to override the command-type replaceEnvironment property. If this property is set to true, this environment variable will be passed to the remote command regardless of the replaceEnvironment property setting.

Stream Parsers

It is possible to attach a parser (which we also refer to as a tokenizer) to the output and error streams of any command-type in order to capture information and use it to side-effect existing attributes, or to generate new ones on the fly. While the parser is not completely general, it is capable of a wide range of tasks which would typically be required in the handling of output from batch and runtime systems.

TokenizerType

The main parser elements used by the tokenizer are target, match, and test. See the tokenizer examples demonstrating various usage scenarios.

The type element will most commonly not be set, meaning the built-in parser will be used; however, it is possible to implement a custom parser as a contribution to the org.eclipse.ptp.rm.jaxb.core.streamParserTokenizer extension point, in which case this element should be set to its extension id. Note that the extension requires the class to implement org.eclipse.ptp.rm.jaxb.control.internal.IStreamParserTokenizer, which is a Runnable interface with an initialize method which passes in any job id plus the current environment map; the details of such a parser's implementation are not, however, configured from the XML document.

The built-in tokenizer can read the stream in two different ways. If delim is provided, the stream is split using the indicated value. The string should only be one character in length (escaped or non-escaped). Provision is made for the '\r\n' (Windows) two-character delimiter internally; in this case the delimiter should be set to "\r" (however, as already mentioned, PTP does not generally guarantee that system calls will work on Windows). Setting includeDelim means that the delimiter will appear as the last char on the returned stream segment.

The second way to read from the stream is to provide a maxMatchLen size; what this indicates is that whatever substring needs to be found on the stream will not exceed this length. The stream is then read in swatches of maxMatchLen, with the internal buffer set to twice this size, so that each successive read shifts the buffer to the "left" by one length. This guarantees that all such substrings will eventually be matched.

Sometimes a sort of "look-ahead" paradigm is necessary. For instance, one may need to match a segment or segments whose position is defined from the end of the output, but you do not know in advance the actual stream length. In this case, one can opt to read until the end of the stream (all="true"), retaining only the last N buffer-lengths or delimited segments, as indicated by the save field. When the parser reaches the end of the stream, it will then apply the various targets to each saved segment in order.

applyToAll is discussed further under target . The exit-on element indicates that the tokenizer should quit immediately when it encounters this pattern; exit-after indicates that the tokenizer should quit when it encounters this pattern, but should first apply the current segment to its targets.

Target Type

TargetType

A tokenizer may be given any number of target elements. The target denotes a particular value (object) currently in, or to be written to, the environment, which will be side-effected on the basis of the result of the tokenization. A target in turn contains match elements and test elements; the former are run as part of the stream processing; the latter are run after the stream processing has been completed. The optional else element is applied only if there are no other tests defined or if none of the defined tests succeed).

The target object is either to be constructed at match time, or it pre-exists in the environment. If not constructed, ref points to the name of the attribute in the environment (recall that for the runtime job identifier, @jobId is used).

Note: when new targets are constructed, there is a merge operation at the end of tokenization which attempts to combine objects into a single instance identified by their name attribute. This assumes that such names will be unique and that any other values to be set on the object which are not explicitly bound in some way to that name via the match pattern will appear on the stream before a new name does (see ex. 5 in tokenizer examples). The default behavior of this merge is that it will fail if two objects with the same name but differing values are generated by the parsing. (This excludes add and put operations which create a list or map; in these cases, the two collections or maps will be combined into one. This does NOT work, however, for append actions.) To allow duplicates, set the allowOverwrites to true; in this case, successive duplicates simply replace the preceding object.

The default behavior of the tokenizer read-match sequence is as follows:

  1. read from the stream either a set number of chars or until the delimiter is found;
  2. for each target:

Only one qualifying target is processed for any given segment read, and for the given target, the first pattern matched is the one processed for its actions. This is basically the "OR" semantics of normal logic programming; hence the implementer must be careful to arrange the matches inside a target in such a way that the more specific match patterns precede the more general.

Three boolean fields allow you to modify this behavior.

  1. The applyToAll field on the tokenizer-type element means take the unmatched part of the read stream and pass it to the next target, even if there was a previous match; this allows you to capture more than one regex pattern per stream segment (see ex. 6 in tokenizer examples).
  2. The matchAll field on the target-type element means do not try to match an already matched expression until all the others are matched (i.e., a logical AND instead of OR governs the set of matches at successive calls to the target match operation); this allows one to use, for instance, .* repeatedly but set different fields of the object with the resulting match (see ex. 5 in tokenizer examples).
  3. The moveToTop field on the match-type element indicates to the tokenizer that the matched target be promoted to first position in the list of targets. This is useful when there is an ordering which expects types of attributes to be grouped in sequence on the stream (see ex. 4 in tokenizer examples).

When a match is found, the set of action types it contains are all applied.

Match Type

MatchType

Each of these types corresponds to an action to be taken on the indicated field of the target object.

Element Description
set sets the value of that field
append adds to a string buffer, whose string value will be set on the field
add adds to a list to which the value of that field will be set
put places a key-value pair in a map to which the value of that field will be set
throw throws an exception and (optionally) also sets the value of the field

The actions listed here all have entry-type children, either single ( set, throw ) or potentially multiple. All of these except throw also allow you to force the creation of a new object (forceNewObject) each time it is applied; the new object then replaces the current one for successive actions in the match.

Entry Type

EntryType

This value-abstraction allows one to set key (for maps) and value as literals or references to other attributes to be resolved in the current environment; to reference the matched segment parts one sets keyIndex and valueIndex if the regex was used to split the segment; otherwise, keyGroup and valueGroup refer to the capture group of the regex pattern, with group 0 referring to the entire match.

Test Type

TestType

As mentioned above, the test-type elements are all run after the tokenization has reached the end of the stream. This class of actions is useful for setting values based on other values produced during tokenization. A test is one or more comparison operations plus a set of actions to apply to the target fields in the case of either success or failure (the "else" element); see ex. 3 or the "get-job-status" example in tokenizer examples.

The op attribute can be one of the following comparisons:

EQ : equals
LT : less than
LE : less than or equal to
GT : greater than
GE : greater than or equal to

When the operation is set to one of these, it is expected that the two value elements will be used. As usual, these elements can be literals or can contain variables to be resolved into a string type; #FIELD refers to the value of the given field on the current target; the strings will be converted in conformity with the inferred (primitive) type of the comparison. The else element also pertains to comparison tests; the actions listed there will be taken upon failure of the comparison.

The op attribute can also be a logical operator [AND, OR, NOT], in which case the embedded test object should be used; these can be nested to an arbitrary depth, but of course must bottom out in a comparison operation.

Contents of Tokenizer Examples (tokenizer-examples.xml)


Example Description
1 output is a list of line-separated queue names to be assigned to the known attribute "available-queues"
2 output is to be searched for its final line which should contain a job id of the form "[digits].[chars]"
3 indeterminate number and order of lines containing parts of attribute definitions, but each line bearing a distinct id (e.g., openMPI attribute discovery)
4 indeterminate number of definitions, but grouped by caption; use of moveToTop to promote the target to the top of the list when the caption appears
5 similar to 4, but without delimiter (implicit ordering)
6 similar to 4, but with indeterminate type order and using buffer + DOTALL | UNIX_LINES
7 indeterminate number of attribute definitions, but on single line
8 looking for values interspersed in the stream but which will not exceed 32 chars
9 successive names/values, in order, but staggered
10 forced merge
11 exit-on
12 exit-after
13 get-job-status (use of tests)