Command line options

This page summarizes command line options for the Shasta executable. An effort will be made to keep this page consistent with the code, but it is possible that discrepancies may temporarily appear as new features are added. In that case, the list of command line options obtained by invoking the Shasta executable with --help is, of course, more authoritative than the contents of this page.

Defaults

Default values of Shasta command line options are not recommended for any application scenario, and mostly reflect approximate compatibility with previous versions. The shasta/conf or shasta-install/conf directories provide sample configuration files (with a .conf extension) containing best known sets of assembly parameters for specific applications. See the beginning of each configuration file for comments on what scenario the file applies to. See below for information on how to use a configuration file in a Shasta assembly.

Saving some typing

Many option names are long and descriptive, and to make it easier to type them Shasta provides the ability to create a bash completion script. If you are using the bash shell (the default shell in most modern Linux distributions), you can use the commands below to save yourself some typing. This assumes that the shasta executable is named just like this - shasta - and is accessible via your PATH environment variable (if it is not you have to invoke shasta using an absolute or relative path):

shasta --command createBashCompletionScript
source shastaCompletion.sh

The first command creates a script named shastaCompletion.sh. The second command runs that script in the current shell (this is important - running the script without the source command will not work). Now, while typing a shasta option or keyword, you can press TAB, and the option name will be completed as much as can be done without ambiguity. You can also press TAB a second time to get suggestions for what could be allowed next.

The command that generates the script can be run only once, but you will need to rerun the source command above every time you start a new shell. Alternatively, you can put the source command in your .bashrc file or in one of the standard locations used to store bash completion scripts.

Configuration file

Some options are only allowed on the command line, but most of them can also optionally be specified using a configuration file. A configuration file is specified using command line option --config followed by the name of the desired configuration file, which can be specified as a relative or absolute path. Earlier versions of Shasta required an absolute path, but this is no longer the case.

See above for some information on Shasta default values and on available sample configuration files.

Values specified on the command line take precedence over values specified in the configuration file. This makes it easy to override specific values in a configuration file.

Options that can be specified in both places are of the form --SectionName.optionName. The format of the configuration file is as follows:

[SectionA]
option1 = valueA1
option2 = valueA2
[SectionB]
option1 = valueB1
option2 = valueB2

The above is equivalent to using the following command line options:

--SectionA.option1 valueA1 
--SectionA.option2 valueA2 
--SectionB.option1 valueB1 
--SectionB.option2 valueB2

For example, the value for option MarkerGraph.minCoverage can be specified in the [MarkerGraph] section of the configuration file as follows:

[MarkerGraph]
minCoverage = 0

A sample configuration file containing default values for all options is provided in the shasta source tree at shasta/conf/shasta.conf. I It is also available in a Shasta build as shasta-install/conf/shasta.conf.

In the configuration file, blank lines and lines begining with # are ignored and can be used to add coments and to improve readability of the configuration file.

Boolean switches

Some command line options are boolean switches, that is, control options that can be turned on or off rather then be given a value.

To turn on one of these switches on the command line, just add it to the command line without any value, for example --Assembly.storeCoverageData. To turn it off, just omit it from the command line (the default value is turned off).

To turn on one of these switches in a configuration file, you can either enter it without value

storeCoverageData =

or assign to it one of the following values: 1, true, True, yes, Yes. To turn off one of these switches in a configuration file, assign to it one of the following values: 0, false, False,no, No.

Boolean switches are indicated as such in the Description column in he tables below.

Options allowed only on the command line

Option	Default value	Description
`--help` or `-h`		Use this option to obtain a summary of allowed command line options.
`--version` or `-v`		Identify the Shasta version.
`--config`		Specifies the name of a configuration file.
`--input`		Specifies the names of the input files for the assembly. This option is mandatory. At least one input file must be specified. To specify multiple input files, enter them separated by space after `--input`.
`--assemblyDirectory`	`ShastaRun`	Specifies the name of the directory where assembly output is stored. If `--command` is `assemble` (the default), this directory must not exist and is automatically created. For most other commands, this directory must exist. See here for more information on the output files created by Shasta.
`--command`	`assemble`	Specifies te assembly command to be run. Can be one of the following: `assemble` Shasta runs an assembly. `saveBinaryData` Shasta saves assembly binary data to disk. `cleanupBinaryData` Shasta cleans up binary data stored in the `Data` directory of the assembly directory. The `Data` directory contains binary data representing assembly data structures. Depending on the memory mode in use, the `Data` directory can be the mount point of a filesystem in memory, and in that case running this command requires root access via sudo. Not all memory modes actually create a `Data` directory. After `cleanupBinaryData` runs, you can no longer use the Python API or the Shasta http server to inspect assembly results. Make sure to use option `--assemblyDirectory` to specify the run directory that you want to cleanup. `explore` The Shasta assembler starts in a mode that permits exploring assembly data structures using an Internet browser. `createBashCompletionScript` Shasta creates a bash completion script that can be used to save typing when invoking `shasta`.
`--memoryMode`(not supported on MacOS)	`anonymous`	Can be `anonymous` or `filesystem`. For best performance use `--memoryMode filesystem --memoryBacking 2M`. However, using these options requires root access via `sudo`. Depending on `sudo` setup, this may result in prompting for a password. Not supported on MacOS. On MacOS, Shasta operates as if `--memoryMode filesystem --memoryBacking disk` was specified.
`--memoryBacking` (not supported on MacOS)	`4K`	Can be `disk`, `4K`, or `2M`. For best performance use `--memoryMode filesystem --memoryBacking 2M`. However, using these options requires root access via `sudo`. Depending on `sudo` setup, this may result in prompting for a password. Not supported on MacOS. On MacOS, Shasta operates as if `--memoryMode filesystem --memoryBacking disk` was specified.
`--threads`	`0`	Specifies the number of threads to be used, or 0 to request one thread per virtual processor.
`--exploreAccess`	`user`	Specifies access control for command `explore`.
`--port`	`17100`	The TCP port to be used by command `explore`. If the specified port is not available, Shasta will try again after incrementing the port number a few times.

Options allowed on the command line and in the config file

See here for the format required to enter these options in a configuration file.

Option	Default value	Description
`--Reads.minReadLength`	`10000`	Read length cutoff. Reads shorter than this number of bases are discarded on input and not used in the assembly.
`--Reads.desiredCoverage`	`0`	Specifies a desired value for total coverage, in bases. If not zero, the read length cutoff specified via `--Reads.minReadLength` is increased to reduce coverage to the specified value. This value can be specified as a number of bases, or using power of 10 multipliers immediately following the numeric value. The powers of ten can be abbreviated as `G`, `Gb`, or `Gbp` for 10⁹ bases, `M`, `Mb`, or `Mbp` for 10⁶ bases, `K`, `Kb`, or `Kbp` for 10³ bases. For example, a desired coverage of 120×10⁹ bases can be requested specifying `120G`, `120Gb`, `120Gbp`, or `120000000000`. If coverage available using only reads longer than `Reads.minReadLength` is less than the value specified for `--Reads.desiredCoverage`, the assembly terminates with an error message.
`--Reads.noCache`	`False`	This is a Boolean switch. If set, requests skipping the Linux cache when loading reads. Implemented for Linux only (uses the O_DIRECT flag). Can help performance, but only use it if you know you will not need to access the input files again soon.
`--Reads.palindromicReads.skipFlagging`	`False`	Skip flagging palindromic reads. Oxford Nanopore reads should be flagged for better results.
`--Reads.palindromicReads.maxSkip`	`100`	Used for palindromic read detection.
`--Reads.palindromicReads.maxMarkerFrequency`	`10`	Used for palindromic read detection.
`--Reads.palindromicReads.alignedFractionThreshold`	`0.1`	Used for palindromic read detection.
`--Reads.palindromicReads.nearDiagonalFractionThreshold`	`0.1`	Used for palindromic read detection.
`--Reads.palindromicReads.deltaThreshold`	`100`	Used for palindromic read detection.
`--Kmers.generationMethod`	`0`	Used to select how k-mers to be used as markers are generated. Can be one of the following: 0: Random selection. 1: Random selection, excluding k-mers that are globally overenriched, as defined by their global frequency in input reads, and by the value specified as `--Kmers.enrichmentThreshold`. 2: Random selection, excluding k-mers that are overenriched even in a single read, as defined by the value specified as `--Kmers.enrichmentThreshold`. 3: Read from file. Use `--Kmers.file` to specify the file.
`--Kmers.k`	`10`	Length of marker k-mers (in run-length representation).
`--Kmers.probability`	`0.1`	Probability that a k-mer is selected to be used as a marker.
`--Kmers.enrichmentThreshold`	`100.`	If `--Kmers.suppressHighFrequencyMarkers` is set, this controls the enrichment threshold above which k-mers are considered overenriched. Enrichment is ratio of k-mer frequency in reads to random. Only used if `--Kmers.generationMethod` is 1 or 2.
`--Kmers.file`		The absolute path of a file containing the k-mers to be used as markers, one per line. Only used if `--Kmers.generationMethod` is 3.
`--MinHash.version`	`0`	The version of the MinHash/LowHash algorithm to be used. Can be 0 (default) or 1 (experimental).
`--MinHash.m`	`4`	The number of consecutive markers that define a MinHash/LowHash feature.
`--MinHash.hashFraction`	`0.01`	Defines how low a hash has to be to be used with the LowHash algorithm.
`--MinHash.minHashIterationCount`	`10`	The number of MinHash/LowHash iterations, or 0 to let --MinHash.alignmentCandidatesPerRead control the number of iterations.
`--MinHash.alignmentCandidatesPerRead`	`20`	If `--MinHash.minHashIterationCount` is 0, MinHash iteration is stopped when the average number of alignment candidates that each read is involved in reaches this value. If `--MinHash.minHashIterationCount` is not 0, this is not used.
`--MinHash.minBucketSize`	`0`	The minimum size for a bucket to be used by the MinHash/LowHash algoritm.
`--MinHash.maxBucketSize`	`10`	The maximum size for a bucket to be used by the MinHash/LowHash algoritm.
`--MinHash.minFrequency`	`2`	The minimum number of times a pair of reads must be found by the MinHash/LowHash algorithm in order to be considered a candidate alignment.
`--MinHash.allPairs`	`False`	This is a Boolean switch that causes the MinHash process to be skipped. Instead, all possible read pairs are marked as alignment candidates, on both relative orientations. This should only be used for very small test assemblies as it can become prohibitively slow for large assemblies.
`--Align.alignMethod`	`3`	The alignment method to be used to compute marker alignments between reads: 0 = Old Shasta alignment method. Use this to reproduce Shasta behavior before release 0.5.0. 1 = SeqAn. This gives the best alignment results but it is slow and should only be used for testing. 3 = Banded SeqAn.
`--Align.maxSkip`	`30`	The maximum number of markers that an alignment is allowed to skip.
`--Align.maxSkip`	`30`	The maximum amount of marker drift that an alignment is allowed to tolerate between successive markers.
`--Align.maxTrim`	`30`	The maximum number of skipped markers tolerated in an alignment at the beginning and end of a read.
`--Align.maxMarkerFrequency`	`10`	Marker frequency threshold when computing alignments. Markers that occur more than this number of times in either ot the two reads to be aligned are ignored.
`--Align.minAlignedMarkerCount`	`100`	The minimum number of aligned markers for an alignment to be used.
`--Align.minAlignedFraction`	`0`	The minimum fraction of aligned markers for an alignment to be used.
`--Align.matchScore`	`6`	Match score for marker alignments (only for alignment methods 1 and 3).
`--Align.mismatchScore`	`-1`	Mismatch score for marker alignments (only for alignment methods 1 and 3).
`--Align.gapScore`	`-1`	Gap score for marker alignments (only for alignment methods 1 and 3).
`--Align.downsamplingFactor`	`0.1`	Downsampling factor for downsampled marker alignments (only for alignment method 3).
`--Align.bandExtend`	`10`	Amount to extend the alignment band, in markers (only used for alignment method 3).
`--Align.maxBand`	`1000`	Maximum band width, in markers, for banded marker alignments (only used for alignment method 3).
`--Align.sameChannelReadAlignment.suppressDeltaThreshold`	`0`	If not zero, alignments between reads from the same nanopore channel and close in time are suppressed. The `read` meta data fields from the FASTA or FASTQ header are checked. If their difference, in absolute value, is less than the value of this option, the alignment is suppressed. This can help avoid assembly artifact. This check is only done if the two reads have identical meta data fields `runid`, `sampleid`, and `ch`. If any of these meta data fields are missing, this check is suppressed and this option has no effect.
`--Align.suppressContainments`	`False`	This is a Boolean switch. If set, containment alignments are suppressed. Containment alignments are alignments in which one read is entirely contained in another read, except possibly for up to maxTrim markers at the beginning and end.
`--ReadGraph.creationMethod`	`0`	The method used to create the read graph (0 or 2).
`--ReadGraph.maxAlignmentCount`	`6`	The maximum alignments to be kept in the read graph for each read.
`--ReadGraph.minComponentSize`	`100`	The minimum size (number of oriented reads) of a connected component of the read graph to be kept. This is currently ignored.
`--ReadGraph.maxChimericReadDistance`	`2`	Used for chimeric read detection.
`--ReadGraph.removeConflicts`	`False`	This is a Boolean switch. Remove conflicts from the read graph. Experimental - do not use.
`--MarkerGraph.minCoverage`	`10`	The minimum coverage for a marker graph vertex. Vertices with lower coverage are not generated. Specifying 0 causes a suitable value of this parameter to be selected automatically.
`--MarkerGraph.maxCoverage`	`100`	The maximum coverage for a marker graph vertex. Vertices with higher coverage are not generated.
`--MarkerGraph.minCoveragePerStrand`	`0`	The minimum coverage per strand for a marker graph vertex. Vertices with lower coverage on either strand are not generated.
`--MarkerGraph.lowCoverageThreshold`	`0`	Used during approximate transitive reduction. Edges with coverage less than or equal to this value are unconditionally removed from the marker graph, even at the cost of breaking reachability. This never happens with the default value 0.
`--MarkerGraph.highCoverageThreshold`	`256`	Used during approximate transitive reduction. Edges with coverage greater than or equal to this value are unconditionally kept in the marker graph, even if they could be removed without breaking reachability. This never happens with the default value 256, because marker graph edge coverage is stored in one byte and saturates at 255.
`--MarkerGraph.maxDistance`	`30`	Used during approximate transitive reduction of the marker graph. It controls the length of each Breadth First Search (BFS) used to determine reachability. This length is expressed in marker graph edges.
`--MarkerGraph.edgeMarkerSkipThreshold`	`100`	Used during approximate transitive reduction of the marker graph. Edges with coverage 1 are unconditionally removed from the marker graph if the only supporting read has a marker skip of more than this number of markers on that edge. Large marker skips are indicative of artifacts or errors.
`--MarkerGraph.pruneIterationCount`	`6`	The number of marker graph prune iterations. This equals the maximum length of dead branches that are removed.
`--MarkerGraph.simplifyMaxLength`	`10,100,1000`	Used for bubble removal.
`--MarkerGraph.crossEdgeCoverageThreshold`	`0.`	Experimental. Cross edge coverage threshold. If this is not zero, assembly graph cross-edges with average edge coverage less than this value are removed, together with the corresponding marker graph edges. A cross edge is defined as an edge v0->v1 with out-degree(v0)>1, in-degree(v1)>1.
`--MarkerGraph.refineThreshold`	`0`	Experimental. Length threshold, in markers, for the marker graph refinement step, or 0 to turn off the refinement step.
`--MarkerGraph.reverseTransitiveReduction`	`False`	This is a Boolean switch. If set, approximate reverse transitive reduction of the marker graph in the reverse direction is also performed.
`--MarkerGraph.peakFinder.minAreaFraction`	`0.08`	Used in the automatic selection of `--MarkerGraph.minCoverage` when `--MarkerGraph.minCoverage` is set to 0.
`--MarkerGraph.peakFinder.areaStartIndex`	`2`	Used in the automatic selection of `--MarkerGraph.minCoverage` when `--MarkerGraph.minCoverage` is set to 0.
`--Assembly.markerGraphEdgeLengthThresholdForConsensus`	`0`	Controls assembly of long marker graph edges.
`--Assembly.consensusCaller`	`Bayesian:guppy-2.3.5-a`	Used to select the consensus caller for repeat counts.
`--Assembly.storeCoverageData`	`False`	This is a Boolean switch used to request storing of coverage data (only useful in conjunction with `--memoryMode filesystem`).
`--Assembly.writeReadsByAssembledSegment`	`False`	This is a Boolean switch used to request writing a csv file containing all the reads that were used to assemble each segment).
`--Assembly.pruneLength`	`0`	Prune length (in markers) for pruning of the assembly graph. Assembly graph leaves shorter than this number of markers are iteratively pruned. Set to zero to suppress pruning of the assembly graph. Assembly graph pruning takes place separately and in addition to marker graph pruning.
`--Assembly.detangleMethod`	`0`	Experimental. Method used to detangle the assembly graph. Valid values: 0 = no detangling. 1 = strict detangling. 2 = less strict detangling, controlled by `Assembly.detangle.* options`.
`--Assembly.detangle.diagonalReadCountMin`	`1`	Experimental. Minimum number of reads on detangle matrix diagonal elements required for detangling. Only used with `--Assembly.detangleMethod 2`.
`--Assembly.detangle.offDiagonalReadCountMax`	`2`	Experimental. Maximum number of reads on detangle matrix off-diagonal elements allowed for detangling. Only used with `--Assembly.detangleMethod 2`.
`--Assembly.detangle.offDiagonalRatio`	`0.3`	Experimental. Maximum ratio of total off-diagonal elements over diagonal element allowed for detangling. Only used with `--Assembly.detangleMethod 2`.
`--Assembly.iterative`	`False`	This is a Boolean switch used to request iterative assembly (experimental).
`--Assembly.iterative.iterationCount`	`3`	Number of iterations for iterative assembly (experimental).
`--Assembly.iterative.pseudoPathAlignMatchScore`	`1`	Pseudopath alignment match score for iterative assembly (experimental).
`--Assembly.iterative.pseudoPathAlignMismatchScore`	`-1`	Pseudopath alignment mismatch score for iterative assembly (experimental).
`--Assembly.iterative.pseudoPathAlignGapScore`	`-1`	Pseudopath alignment gap score for iterative assembly (experimental).
`--Assembly.iterative.mismatchSquareFactor`	`3.`	Mismatch square factor for iterative assembly (experimental).
`--Assembly.iterative.minScore`	`0.`	Minimum pseudo-alignment score for iterative assembly (experimental).
`--Assembly.iterative.maxAlignmentCount`	`6`	Maximum number of read graph neighbors for iterative assembly (experimental).
`--Assembly.iterative.bridgeRemovalIterationCount`	`3`	Number of read graph bridge removal iterations for iterative assembly (experimental).
`--Assembly.iterative.bridgeRemovalMaxDistance`	`2`	Maximum distance for read graph bridge removal for iterative assembly (experimental).

Table of contents