Skip to content

pixel-decode

Overview

pixel-decode projects a trained factor model onto the space and annotates each pixel/voxel/molecule with the top factors and their probabilities. In standard mode, the inference is done at pixel or voxel level. In the feature-specific modes, the inference is feature-specific even at the same location.

In all cases, the input is the x-y tiled data created by pts2tiles.

Basic usage:

punkst pixel-decode --model ${path}/model.tsv \
--in-tsv ${path}/transcripts.tiled.tsv --in-index ${path}/transcripts.tiled.index \
--temp-dir ${tmpdir} --out-pref ${path}/pixel --output-binary \
--icol-x 0 --icol-y 1 --icol-feature 2 --icol-val 3 \
--hex-grid-dist 12 --n-moves 2 \
--pixel-res 0.5 --threads ${threads} --seed 1

The main inference result in this example is written to ${path}/pixel.bin together with ${path}/pixel.index. The output are organized in regular square tiles for efficient downstream analysis (see tile-op).

Required Parameters

--in-tsv - Specifies the tiled data created by pts2tiles.

--in-index - Specifies the index file created by pts2tiles.

--icol-x, --icol-y - Specify the columns with X and Y coordinates (0-based).

--icol-feature - Specifies the column index for feature (0-based).

--icol-val - Specifies the column index for count/value (0-based).

--model - Specifies the model file where the first column contains feature names and the subsequent columns contain the parameters for each factor. The format should match that created by topic-model.

--temp-dir - Specifies the directory used for temporary files. Required unless --in-memory is set.

Output specification - One of these must be provided: --out-pref - Specifies the output prefix for all output files.

--out - (Deprecated, for backward compatibility) Specifies the output file.

Hexagon grid parameters for 2D and thin 3D, one of these must be provided:

--hex-size - Specifies the size (side length) of the hexagons for initializing anchors.

--hex-grid-dist - Specifies center-to-center distance in the axial coordinate system used to place anchors. Equals hex-size * sqrt(3).

Anchor spacing parameters --anchor-dist - Specifies the distance between adjacent anchors. Required for standard 3D. In 2D and thin 3D, it can be provided directly instead of --n-moves.

--n-moves - (For 2D and thin-3D) Specifies the number of sliding moves in each axis to generate the anchors. If --n-moves is n, anchor-dist equals hex-grid-dist / n.

Optional Parameters

Input Parameters

--coords-are-int - If set, indicates that the coordinates are integers; otherwise, they are treated as floating point values.

--icol-z - Column index for the z coordinate (0-based). Requires either --thin-3D or --standard-3D.

--thin-3D - Activates the thin 3D path. (See the 3D modes section below)

--standard-3D - Activates the standard 3D path. (See the 3D modes section below)

--feature-is-index - If set, the values in --icol-feature are interpreted as feature indices. Otherwise, they are expected to be feature names.

--feature-weights - Specifies a file to weight each feature. The first column should contain the feature names, and the second column should contain the weights.

--default-weight - Specifies the default weight for features not present in the weights file (only if --feature-weights is specified). Default: 0.

--anchor - Specifies a file containing fixed anchor points. This is currently only supported for 2D slda. In 3D, fixed anchors are ignored. In nmf, externally loaded anchors are not currently used.

--sample-list - Runs the same model and settings on multiple datasets listed in one TSV file. See "Process multiple samples" below.

--in-memory - Keeps boundary buffers in memory instead of writing temporary buffer files. If set, --temp-dir is not required.

Algorithm Parameters

--single-feature-pixel - Enables feature-specific decoding on collapsed pixel/voxel bins. Decoding still uses the user-specified --pixel-res / --pixel-res-z, but factor probabilities are computed separately for each feature within the same pixel or voxel.

--single-molecule - Enables raw single-molecule decoding without collapsing records to pixels or voxels. (This is not recommended for data that do not have single molecule resolution, such as Visium HD)

--max-iter - Maximum number of outer iterations. Default: 100.

--mean-change-tol - Convergence tolerance for the outer iterations. Default: 1e-3.

--background-model - Background profile file. If provided, background probabilities are modeled explicitly.

--bg-fraction-prior-a0, --bg-fraction-prior-b0 - Beta prior hyperparameters for the background fraction in slda mode.

Processing Parameters

--pixel-res - Resolution for the analysis, in the same unit as the input coordinates. Default: 1 (each pixel treated independently). Setting the resolution equivalent to 0.5-1μm is recommended, but it could be smaller if your data is very dense. For Visium HD (where the pixel size is 2μm), use --pixel-res 2.

--pixel-res-z - Resolution for aggregating pixels in the z dimension. Used only in 3D mode. Default: 1.

--radius - Support radius used in pixel-to-anchor weighting. In 2D and thin 3D it also controls the anchor search neighborhood. Default: anchor-dist * 1.2 in 2D. In thin 3D, a default is derived from the x-y anchor spacing and the nearest thin-3D z-level spacing. stencil, but it does control the weight decay scale and contributes to the x-y padding.

--half-life-dist - Ratio h in (0, 1) such that an anchor at distance h * radius receives weight 0.5. Default: 0.7. The implemented weighting rule is w(d) = clamp(1 - (d / radius)^nu, 0.05, 0.95) with nu = log(0.5) / log(h).

--min-init-count - Minimum accumulated anchor support required for an anchor to be retained during initialization. In thin 3D this support is radius-based and distance-weighted. Default: 10.

--zmin, --zmax - z range for 3D mode. Thin 3D requires both values. Standard 3D accepts them, but only uses them when --ignore-outside-zrange is set.

--thin-3d-z-levels - Explicit z coordinates for thin-3D anchor levels.

--thin-3d-n-z-levels - Number of evenly spaced thin-3D anchor levels to generate between zmin and zmax.

--ignore-outside-zrange - Drop observations outside [zmin, zmax] in 3D mode.

--threads - Number of threads to use for parallel processing. Default: 1.

--seed - Random seed for reproducibility. If not set or ≤0, a random seed will be generated.

Output Parameters

--output-binary - Writes the main output as <prefix>.bin plus <prefix>.index. This cannot be combined with --output-original.

--output-original - Writes the main output as text and includes the original feature names and counts together with the factor results.

--output-anchors - Writes anchor-level top-factor assignments to <prefix>.anchors.tsv.

--output-bg-prob-expand - In text mode, writes one line per feature per pixel to include background probabilities. Ignored if --output-original is set.

--use-ticket-system - If set, the order of pixels in the output file is deterministic across runs (though not necessarily the same as the input order). It incurs a performance penalty.

--top-k - Number of top factors to include in the output. Default: 3.

--output-coord-digits - Number of decimal digits to output for coordinates in text mode (used when input coordinates are float or --output-original is not set). Default: 2.

--output-prob-digits - Number of decimal digits to output for probabilities. Default: 4.

--ext-col-ints - Additional integer columns to carry over to the output file. Format: "idx1:name1 idx2:name2 ..." where 'idx' are 0-based column indices. Example: --ext-col-ints 4:celltype 5:cluster.

--ext-col-floats - Additional float columns to carry over to the output file. Format: "idx1:name1 idx2:name2 ..." where 'idx' are 0-based column indices. Example: --ext-col-floats 6:quality.

--ext-col-strs - Additional string columns to carry over to the output file. Format: "idx1:name1:len1 idx2:name2:len2 ..." where 'idx' are 0-based column indices and 'len' are maximum lengths of strings. Example: --ext-col-strs 7:sample_id:20 8:batch:10.

Feature-Specific Modes

The two feature-specific modes share the same inference path, where different features are considered separately even if they have the same location. Consider the feature-specific modes only if your factors are closer to "transcriptional modules" than "cell types".

Caution: for Visium HD, where the resolution is 2μm, only --single-feature-pixel is valid.

Compared with the standard pixel mode:

  • standard mode jointly considers all observations in the same pixel or voxel and makes one inference result per pixel/voxel
  • --single-feature-pixel still collapses the raw data by analysis resolution, but can assign different factor probabilities to features within the same pixel or voxel
  • --single-molecule does not collapse the data before inference; each accepted input record is treated on its own

Output differences:

  • --single-feature-pixel writes integer pixel or voxel coordinates plus one extra uint32_t featureIdx
  • --single-molecule writes raw float coordinates plus one extra uint32_t featureIdx

Current restrictions for both --single-feature-pixel and --single-molecule:

  • requires --output-binary
  • does not support --output-original
  • does not support --output-bg-prob-expand
  • does not support --ext-col-ints, --ext-col-floats, or --ext-col-strs
  • does not support extended or weighted parser modes

3D Modes

3D mode needs to be configured explicitly

  • Use --icol-z --thin-3D for thin 3D (e.g. 10um tissue slice from imaging-based platforms)
  • Use --icol-z --standard-3D for real, deep 3D
  • --icol-z without one of those flags is rejected
  • 3D mode does not support external fixed anchors
  • standard 3D requires positive --anchor-dist to define the BCC lattice
  • thin 3D and standard 3D both still use the optional --half-life-dist for distance-based anchor weighting

Standard 3D

Standard 3D uses a body-centered cubic (BCC) BCC anchor lattice in the input coordinate system. This is experimental and is only tested on one DeepSTARmap dataset.

Caution: we assume x, y, and z are already use compatible units, see pts2tiles for options to scale the input coordinates differently in preprocessing.

  • --anchor-dist is the preferred way to define the BCC anchor spacing.
  • Each pixel is connected to anchors from a fixed local BCC neighborhood: the focal BCC cell plus its 14 face-adjacent neighbors.
  • --radius does not change that fixed neighborhood, but it does control the anchor weight decay scale.
  • If --radius is omitted, it defaults to 1.2 * (2 * anchor-dist / sqrt(3)).
  • Anchor weights are computed from the actual pixel-to-anchor distance and the configured support radius.

At the CLI level, standard 3D does not require --hex-size, --hex-grid-dist, or --n-moves. The required geometry control is --anchor-dist.

If --zmin and --zmax are provided in standard 3D, they are ignored unless --ignore-outside-zrange is also set.

Example:

punkst pixel-decode --model ${path}/bcc.model.tsv \
--in-tsv ${path}/transcripts.tiled.tsv --in-index ${path}/transcripts.tiled.index \
--temp-dir ${tmpdir} --out-pref ${path}/pixel_3d --output-binary \
--icol-x 0 --icol-y 1 --icol-z 2 --icol-feature 3 --icol-val 4 \
--standard-3D --anchor-dist 12 \
--pixel-res 0.5 --pixel-res-z 0.5 \
--threads ${threads} --seed 1

Thin 3D

Thin 3D keeps anchor lattice in x-y and adds a small set of z levels across [zmin, zmax] to distribute these anchors. This is an experimental mode designed for thin tissue slices from imaging-based platforms where the range on the z axis is typically only 5-10μm.

  • --zmin and --zmax are required.
  • One of --thin-3d-z-levels or --thin-3d-n-z-levels must be provided.
  • If both are provided, --thin-3d-z-levels takes precedence and --thin-3d-n-z-levels is ignored with a warning.
  • When --thin-3d-n-z-levels is used, the z levels are placed evenly between zmin and zmax.
  • The z coordinate of each anchor is assigned deterministically but pseudorandomly from its x-y anchor key, so z levels are mixed across x-y without the old periodic stripe pattern.
  • --radius controls both thin-3D anchor initialization and the final pixel-to-anchor graph.
  • During initialization, each point contributes to every implicit thin-3D anchor within the 3D support radius, using the same distance-decay rule that is later used during iterative updates.
  • The x-y anchor coordinates are still generated from the shifted hex grids.
  • Thin-3D anchor retention is therefore radius-based rather than “nearest z planes” based.

Example:

punkst pixel-decode --model ${path}/thin3d.model.tsv \
--in-tsv ${path}/transcripts.tiled.tsv --in-index ${path}/transcripts.tiled.index \
--temp-dir ${tmpdir} --out-pref ${path}/pixel_thin3d --output-binary \
--icol-x 0 --icol-y 1 --icol-z 2 --icol-feature 3 --icol-val 4 \
--thin-3D --hex-grid-dist 18 --n-moves 3 \
--thin-3d-z-levels 0 1.5 3 4.5 6 7.5 9 \
--zmin 0 --zmax 20 --radius 7.5 \
--pixel-res 0.5 --pixel-res-z 1 \
--threads ${threads} --seed 1

Process multiple samples

If you want to project the same model onto multiple datasets/samples, you can use --sample-list to pass a tsv file containing all samples' information. (In this case, --in-tsv and --in-index are ignored.) Note: all other parameters are shared across samples, so the input files should have the same structure.

If your multi-sample data are generated by punkst multisample-prepare, it has created a file named *.persample_file_list.tsv. You can just pass this file to --sample-list and optionally use --out-pref to specify an identifier (e.g., the model information) to add to each output file name.

If you created the input file for each sample manually, you can create a tsv file with at least three columns: sample_id, path to the transcript file created by pts2tiles, path to the index file created by pts2tiles.

Optional fourth column: output prefix.

Optional fifth column: anchor file path.

If there are headers, all header lines should start with "#".

punkst pixel-decode --model model.tsv --sample-list multi.persample_file_list.tsv \
--temp-dir /tmp --out-pref results --output-binary \
--icol-x 0 --icol-y 1 --icol-feature 2 --icol-val 3 \
--hex-grid-dist 12 --n-moves 2 --threads 8

Output Files

Files are written with the prefix specified by --out-pref (or inferred from --out).

Main output

<prefix>.bin - Main pixel-level output in binary format when --output-binary is set.

When --single-feature-pixel is enabled, each binary record stores integer pixel or voxel coordinates, featureIdx, and the top-factor pairs.

When --single-molecule is enabled, each binary record stores raw float coordinates, featureIdx, and the top-factor pairs.

<prefix>.tsv - Main pixel-level output in text format when --output-binary is not set.

<prefix>.index - Binary index for the main output. This is written for both text and binary modes. In 2D binary mode, the index describes regular square tiles directly.

<prefix>.anchors.tsv - Anchor-level top-factor assignments when --output-anchors is set.

Summary statistics:

<prefix>.pseudobulk.tsv - Gene-by-factor pseudobulk matrix.

<prefix>.confusion.tsv - Factor-by-factor confusion matrix derived from per-pixel assignments.

<prefix>.denoised_pseudobulk.tsv - Denoised version of the pseudobulk matrix.