Visualization¶
High resolution image of pixel level factorization results¶
draw-pixel-factors visualizes the results of pixel-decode
punkst draw-pixel-factors --in ${path}/pixel.decode --in-color ${path}/color.rgb.tsv --out ${path}/pixel.png --scale 1 --xmin ${xmin} --xmax ${xmax} --ymin ${ymin} --ymax ${ymax}
--in specifies the prefix of the input data and index files created by pixel-decode. The tool will look for ${in}.bin (or ${in}.tsv) and ${in}.index.
--binary indicates the input data is in binary format (if you used punkst pixel-decode with --output-binary)
--in-tsv/--in-data can also accept a gzipped TSV (.tsv.gz) or - for stdin stream input.
--filter requires a seekable TSV with index and is not supported for stdin or gzipped stream input.
--out specifies the output png file.
--in-color specifies a tsv file with the colors for each factor. The first three columns will be interpreted as R, G, B values in the range \(0-255\). The valid lines will be assigned to factors in the order they appear in this file. An example color table can be found in punkst/ext/py/cmap.48.tsv.
--xmin, --xmax, --ymin, --ymax specify the range of the coordinates. If your data is generated by punkst pixel-decode and you want to visualize the whole area, these parameters are unnecessary.
--scale (default 1) scales input coordinates to pixels in the output image. int((x-xmin)/scale) equals the horizontal pixel coordinate in the image. Since coordinates are normally in micrometer and the analysis resolution is typically 0.5 or 1, --scale 1 (default) is usually a good choice. For Visium HD data where the data resolution is 2um, use --scale 2.
--min-prob (default: 1e-3) Minimum probability to consider a pixel.
--top-only Use only the top predicted factor per pixel.
If your specified --transform in topic-model, one way to create the color table is to use the helper python script
Visualize selected factors
Alternatively, you can specify a few factors and their corresponding colors directly:
--channel-list A list of factors to visualize
--color-list A list of colors in hex code (#RRGGBB).
Denoise
When --top-only is used, you could optionally smooth out isolated noisy pixels. This is heuristic, and may only be meaningful when you projected categorical cell types.
--island-smooth The number of iterations to remove isolated noisy pixels different from most of its neighbors. One or two iterations is recommended.
--fill-empty-islands Fill empty pixels surrounded by consistent neighbors (only activated when --island-smooth is set).
Note on joined data
If your input file is generated by tile-op --merge-emb, i.e. there are multiple sets of factor inference results for each pixel/molecule, you need to specify which result set to visualize by --result-set (0-based index).
If NA or other placeholder values are kept in the joined data, use --suppress-kp-warnings to suppress warnings about invalid records.
High resolution image of selected pixel-level features¶
draw-pixel-features visualizes selected genes/features from an indexed pixel TSV
The input files are the same generated by pts2tiles.
punkst draw-pixel-features \
--in-tsv ${pref}.tsv --in-index ${pref}.index \
--icol-x 0 --icol-y 1 --icol-feature 2 --icol-val 3 \
--feature-list Myh1 Myh8 Prkar1a \
--color-list 65ff65 65b2ff ffb265 \
--out ${pref}.genes.png
Use --in-tsv and --in-index to point to the tiled TSV and its index. Unlike draw-pixel-factors, this command currently works on TSV input.
--icol-x, --icol-y, --icol-feature, and --icol-val specify the 0-based columns for coordinates, feature name/ID, and the feature weight or count.
Feature colors can be provided either by:
--feature-listwith matching--color-list--feature-color-map, a TSV whose first column is the feature name and second column is a hex color
If the feature column in the TSV already contains feature names, --feature-dict is not required when using --feature-list or --feature-color-map. If the feature column contains numeric feature IDs, provide --feature-dict so names can be mapped to those IDs.
--xmin, --xmax, --ymin, --ymax restrict the visualization area. When the index file stores a proper global bounding box (newer binary index schema), these bounds are optional and the command can use the full indexed range automatically. Otherwise provide either explicit bounds or --range.
--range reads the plotting bounds from a file containing xmin ymin xmax ymax.
--scale (default 1) converts input coordinates to output image pixels, using int((x-xmin)/scale) and int((y-ymin)/scale).
--threads controls how many indexed tiles are processed in parallel.
HTML report for factor weights and top genes¶
factor_report.py generates HTML reports summarizing factor characteristics and top genes
The script generates an interactive HTML report (${output_pref}.factor.info.html) and a TSV summary (${output_pref}.factor.info.tsv) containing factor weights, top differentially expressed genes, and visualization colors.
python ext/py/factor_report.py --de ${path}/de_bulk.tsv --pseudobulk ${path}/pseudobulk.tsv --color_table ${path}/color.rgb.tsv --output_pref ${path}/report
--de specifies the differential expression results file from de-chisq
--de_neighbor optionally specifies a de-chisq neighbor-output file (e.g. ${out}.1vsNeighbors.tsv) to add a column displaying high specific top genes.
--pseudobulk specifies the pseudobulk count table.
--color_table specifies the RGB color table for factors. This probably should be the same file as that used for punkst draw-pixel-factors.
--output_pref specifies the output prefix for generated files.
--feature_label specifies the column name for features (default: "Feature").
--n_top_gene maximum number of top genes to include in report (default: 20).
--min_top_gene minimum number of top genes to show per factor (default: 10).
--max_pval maximum p-value threshold for significant genes (default: 0.001).
--min_fc minimum fold change threshold for significant genes (default: 1.5).
--annotation optional file with factor annotations to display instead of the factor IDs. It is a tsv file where the first column contains factor IDs as appear in the header of the pseudobulk table, and the second column contains the annotation.
--anchor optional file with anchor genes chosen to represent each factor. It is a tsv file where the first column contains factor IDs as appear in the header of the pseudobulk table, and the second column contains the anchor gene names (separated by things other than tabs).