jloh extract

Description

Extract candidate LOH blocks from single-nucleotide polymorphisms (SNPs) called from reads mapped onto a reference genome. Alternatively, from reads derived from a hybrid mapped onto its parental genomes (at least one).

Usage

jloh extract --vcf <VCF> --ref <FASTA> --bam <BAM> [options]

Or, if using --assign-blocks:

jloh extract --assign-blocks --vcfs [<PATH_1> <PATH_2>] --refs [<PATH_1> <PATH_2>] --bams [<PATH_1> <PATH_2>] [options]

Parameters

Default mode

--vcf <PATH>

VCF file containing single-nucleotide polymorphisms (SNPs).

--bam <PATH>

BAM file containing read mapping records.

--ref <PATH>

FASTA file where reads were mapped.

Assign-blocks mode (hybrids)

--assign-blocks

Activate block assignment mode.

--vcfs [<PATH_1> <PATH_2>]

VCF file containing single-nucleotide polymorphisms (SNPs).

--bams [<PATH_1> <PATH_2>]

BAM file containing read mapping records.

--refs [<PATH_1> <PATH_2>]

FASTA file where reads were mapped.

Common Parameters

Variants

--min-snps-kbp [<INT>,<INT>]

Comma-separated set of two integer values defining heterozygous and homozygous minimum SNPs/kbp densities. See details at Modelling SNP density.

--filter-mode ["all"|"pass"]

Either “all” or “pass”. Whether to select only VCF entries that have the PASS annotation or not.

--min-af <FLOAT>

Minimum allele frequency to consider a SNP heterozygous. Useful when working with polyploid species.

--max-af <FLOAT>

Maximum allele frequency to consider a SNP heterozygous. Useful when working with polyploid species.

Blocks

--min-length <INT>

Minimum length of accepted candidate LOH blocks.

--coarseness <INT>

Minimum length of initial building blocks that are used to define LOH blocks (i.e. nothing shorter than this will be considered an interesting interval).

--min-frac-cov <FLOAT>

Minimum fraction of positions of a candidate LOH block to include it in the final list.

--hemi <FLOAT>

Threshold of coverage ratio between candidate block and surrounding up/downstream regions, below which a block is considered hemizygous (i.e. carrying only one copy).

--overhang <INT>

Size of the up/downstream region checked to define zygosity (see --hemi).

--min-overhang <FLOAT>

Fraction of the --overhang that must be present to infer zygosity (e.g. at the beginning of a chromosome).

--merge-uncov <INT>

Number of uncovered positions (bp) separating two blocks that are ignored, producing a merged block.

Misc

--sample <STR>

Sample name to include in output files.

--output-dir <PATH>

Path to the output directory.

--threads <INT>

Number of parallel operations performed.

--regions <PATH>

BED file containing regions where blocks shall be searched in. This BED file may be created via jloh g2g or it may be a custom BED file with regions of interest.