jloh onco_extract
Description
Extract candidate LOH blocks from single-nucleotide polymorphisms (SNPs) called from reads mapped onto a reference genome using cancer data. Data may be a single set of BAM+VCF or a matched tumor/normal pair (recommended).
Usage
jloh onco_extract --vcfs <VCF_control> <VCF_tumor> --ref <FASTA> --bams <BAM_control> <BAM_tumor> [options]
Or, if using --single-mode:
jloh onco_extract --vcf <VCF> --ref <FASTA> --bam <BAM> [options]
Parameters
Default mode
- --vcfs [<PATH_1> <PATH_2>]
VCF files (space-separated) from control & tumor, in this order.
- --bams [<PATH_1> <PATH_2>]
BAM files from used to call the –vcfs (space-separated).
- --ref <PATH>
FASTA file where reads were mapped.
Single sample mode
- --single-mode
Activate block assignment mode.
- --vcf <PATH_1>
VCF file containing single-nucleotide polymorphisms (SNPs).
- --bam <PATH_1>
BAM file containing read mapping records.
- --refs <PATH_1>
FASTA file where reads were mapped.
Common Parameters
Input/Output
- --sample <STR>
Sample name for output files.
- --output-dir <PATH>
Path to an output directory (created if not existing)
- --regions <PATH>
Path to a BED file with regions of interest. The BED file must contain 4 columns: chromosome, start position, end position, and annotation. Annotation may be anything (gene name, transcript name, exon name, locus) as long as it is an alphanumeric string.
Variants
- --max-dist <INT>
Maximum distance allowed between SNPs for them to still be retained within the same homozygous / heterozygous block.
- --filter-mode ["all"|"pass"]
Either “all” or “pass”. Whether to select only VCF entries that have the
PASSannotation or not.
- --min-af <FLOAT>
Minimum allele frequency to consider a SNP heterozygous. Useful when working with polyploid species.
- --max-af <FLOAT>
Maximum allele frequency to consider a SNP heterozygous. Useful when working with polyploid species.
Blocks
- --min-length <INT>
Minimum length of accepted candidate LOH blocks.
- --min-snps <INT>
Minimum number of homozygous SNPs to consider a block in the final results. Homozygous SNPs are an indicator of LOH when a paired normal sample is present.
- --min-snps-het <INT>
Minimum number of heterozygous SNPs to discard a block in the final results.
- --min-frac-cov <FLOAT>
Minimum fraction of positions of a candidate LOH block to include it in the final list.
- --hemi <FLOAT>
Threshold of coverage ratio between candidate block and surrounding up/downstream regions, below which a block is considered hemizygous (i.e. carrying only one copy).
- --overhang <INT>
Size of the up/downstream region checked to define zygosity (see
--hemi).
- --min-overhang <FLOAT>
Fraction of the
--overhangthat must be present to infer zygosity (e.g. at the beginning of a chromosome).
- --merge-uncov <INT>
Number of uncovered positions (bp) separating two blocks that are ignored, producing a merged block.
Misc
- --sample <STR>
Sample name to include in output files.
- --output-dir <PATH>
Path to the output directory.
- --threads <INT>
Number of parallel operations performed.
- --regions <PATH>
BED file containing regions where blocks shall be searched in. This BED file may be created via jloh g2g or it may be a custom BED file with regions of interest.