Usage¶
Minimum input (only genome-wide 𝜋 and FST will be calculated):
perl pogenom.pl --vcf_file <VCF_FILE> --out <OUTPUT_FILES_PREFIX> --genome_size <GENOME_SIZE>
or:
perl pogenom.pl --vcf_file <VCF_FILE> --out <OUTPUT_FILES_PREFIX> --fasta_file <FASTA_FILE>
If a GFF file is provided, gene-wise 𝜋 and gene-wise FST will also be calculated:
perl pogenom.pl --vcf_file <VCF_FILE> --out <OUTPUT_FILES_PREFIX> --gff_file <GFF_FILE>
And if a genetic code file is provided (such as standard_genetic_code.txt in the POGENOM distribution), amino acid frequencies will be calculated for each codon position in each gene and sample, and gene-wise 𝜋 and FST will be calculated also at the amino acid level. Now also non-synonymous to synonymous polymorphism rates (pN/pS) will be calculated for each gene in each sample:
perl pogenom.pl --vcf_file <VCF_FILE> --out <OUTPUT_FILES_PREFIX> --gff_file <GFF_FILE> --genetic_code_file <GENETIC_CODE_FILE>
Required arguments¶
--vcf_file <VCF_FILE> Specify VCF file with data from a single or multiple samples.
--out <OUTPUT_FILES_PREFIX> Specify the prefix of the output file name(s) (overwrites existing files with same names).
--genome_size <GENOME_SIZE> Specify genome size (in bp; integer). Not required if –gff_file or –fasta_file with genome sequence is given.
Optional arguments¶
--gff_file <GFF_FILE> Specify gff file. Either this, --genome_size or --fasta_file must be given.
--fasta_file <FASTA_FILE> Specify fasta file. Either this, --genome_size or --gff_file must be given.
--genetic_code_file <GENETIC_CODE_FILE> Specify genetic code file. E.g. standard_genetic_code.txt in the POGENOM distribution.
--loci_file <LOCI_FILE> Specify file with ids of loci to include.
--sample_file <SAMPLE_FILE> Specify file with ids of samples to be included.
--min_count <MIN_COUNT> Specify minimum coverage (integer) for a locus to be included for the sample.
--min_found <MIN_FOUND_IN> Specify minimum number samples (integer) that a locus needs to be present in to be included. If set to 0, it will be set to the number of samples of the VCF file.
--subsample <SUBSAMPLE> Specify coverage level (integer) at which to subsample.
--keep_haplotypes If this is used, POGENOM will not split haplotypes into single-nucleotide variants, which is otherwise the default behaviour.
--vcf_format <VCF_FORMAT> Specify VCF file format version. Can be set to freebayes (default) or GATK.
--fst_perm <FST_PERM> Specify number of permutations (integer) for making randomised gene-wise Fst. Without setting this randomised Fst are not generated. Warning: use with care, output files can become huge.
--pi_only Set this to make POGENOM only calculate and output genome-wide pi (fast).
--help To print help message on screen.