pixel-decode¶
Overview¶
pixel-decode projects a trained factor model onto the space and annotates each pixel/voxel/molecule with the top factors and their probabilities. In standard mode, the inference is done at pixel or voxel level. In the feature-specific modes, the inference is feature-specific even at the same location.
In all cases, the input is the x-y tiled data created by pts2tiles.
Basic usage:
punkst pixel-decode --model ${path}/model.tsv \
--in-tsv ${path}/transcripts.tiled.tsv --in-index ${path}/transcripts.tiled.index \
--temp-dir ${tmpdir} --out-pref ${path}/pixel --output-binary \
--icol-x 0 --icol-y 1 --icol-feature 2 --icol-val 3 \
--hex-grid-dist 12 --n-moves 2 \
--pixel-res 0.5 --threads ${threads} --seed 1
The main inference result in this example is written to ${path}/pixel.bin together with ${path}/pixel.index. The output are organized in regular square tiles for efficient downstream analysis (see tile-op).
Required Parameters¶
--in-tsv - Specifies the tiled data created by pts2tiles.
--in-index - Specifies the index file created by pts2tiles.
--icol-x, --icol-y - Specify the columns with X and Y coordinates (0-based).
--icol-feature - Specifies the column index for feature (0-based).
--icol-val - Specifies the column index for count/value (0-based).
--model - Specifies the model file where the first column contains feature names and the subsequent columns contain the parameters for each factor. The format should match that created by topic-model.
--temp-dir - Specifies the directory used for temporary files. Required unless --in-memory is set.
Output specification - One of these must be provided:
--out-pref - Specifies the output prefix for all output files.
--out - (Deprecated, for backward compatibility) Specifies the output file.
Hexagon grid parameters for 2D and thin 3D, one of these must be provided:
--hex-size - Specifies the size (side length) of the hexagons for initializing anchors.
--hex-grid-dist - Specifies center-to-center distance in the axial coordinate system used to place anchors. Equals hex-size * sqrt(3).
Anchor spacing parameters
--anchor-dist - Specifies the distance between adjacent anchors. Required for standard 3D. In 2D and thin 3D, it can be provided directly instead of --n-moves.
--n-moves - (For 2D and thin-3D) Specifies the number of sliding moves in each axis to generate the anchors. If --n-moves is n, anchor-dist equals hex-grid-dist / n.
Optional Parameters¶
Input Parameters¶
--coords-are-int - If set, indicates that the coordinates are integers; otherwise, they are treated as floating point values.
--icol-z - Column index for the z coordinate (0-based). Requires either
--thin-3D or --standard-3D.
--thin-3D - Activates the thin 3D path. (See the 3D modes section below)
--standard-3D - Activates the standard 3D path. (See the 3D modes section below)
--feature-is-index - If set, the values in --icol-feature are interpreted as feature indices. Otherwise, they are expected to be feature names.
--feature-weights - Specifies a file to weight each feature. The first column should contain the feature names, and the second column should contain the weights.
--default-weight - Specifies the default weight for features not present in the weights file (only if --feature-weights is specified). Default: 0.
--anchor - Specifies a file containing fixed anchor points. This is currently
only supported for 2D slda. In 3D, fixed anchors are ignored. In nmf,
externally loaded anchors are not currently used.
--sample-list - Runs the same model and settings on multiple datasets listed in one TSV file. See "Process multiple samples" below.
--in-memory - Keeps boundary buffers in memory instead of writing temporary buffer files. If set, --temp-dir is not required.
Algorithm Parameters¶
--single-feature-pixel - Enables feature-specific decoding on collapsed pixel/voxel bins. Decoding still uses the user-specified --pixel-res / --pixel-res-z, but factor probabilities are computed separately for each feature within the same pixel or voxel.
--single-molecule - Enables raw single-molecule decoding without collapsing records to pixels or voxels. (This is not recommended for data that do not have single molecule resolution, such as Visium HD)
--max-iter - Maximum number of outer iterations. Default: 100.
--mean-change-tol - Convergence tolerance for the outer iterations. Default: 1e-3.
--background-model - Background profile file. If provided, background probabilities are modeled explicitly.
--bg-fraction-prior-a0, --bg-fraction-prior-b0 - Beta prior hyperparameters for the background fraction in slda mode.
Processing Parameters¶
--pixel-res - Resolution for the analysis, in the same unit as the input coordinates. Default: 1 (each pixel treated independently). Setting the resolution equivalent to 0.5-1μm is recommended, but it could be smaller if your data is very dense. For Visium HD (where the pixel size is 2μm), use --pixel-res 2.
--pixel-res-z - Resolution for aggregating pixels in the z dimension. Used
only in 3D mode. Default: 1.
--radius - Support radius used in pixel-to-anchor weighting. In 2D and thin
3D it also controls the anchor search neighborhood. Default: anchor-dist *
1.2 in 2D. In thin 3D, a default is derived from the x-y anchor spacing and
the nearest thin-3D z-level spacing.
stencil, but it does control the weight decay scale and contributes to the
x-y padding.
--half-life-dist - Ratio h in (0, 1) such that an anchor at distance
h * radius receives weight 0.5. Default: 0.7. The implemented weighting
rule is w(d) = clamp(1 - (d / radius)^nu, 0.05, 0.95) with
nu = log(0.5) / log(h).
--min-init-count - Minimum accumulated anchor support required for an anchor to be retained during initialization. In thin 3D this support is radius-based and distance-weighted. Default: 10.
--zmin, --zmax - z range for 3D mode. Thin 3D requires both values.
Standard 3D accepts them, but only uses them when
--ignore-outside-zrange is set.
--thin-3d-z-levels - Explicit z coordinates for thin-3D anchor levels.
--thin-3d-n-z-levels - Number of evenly spaced thin-3D anchor levels to
generate between zmin and zmax.
--ignore-outside-zrange - Drop observations outside [zmin, zmax] in 3D mode.
--threads - Number of threads to use for parallel processing. Default: 1.
--seed - Random seed for reproducibility. If not set or ≤0, a random seed will be generated.
Output Parameters¶
--output-binary - Writes the main output as <prefix>.bin plus <prefix>.index. This cannot be combined with --output-original.
--output-original - Writes the main output as text and includes the original feature names and counts together with the factor results.
--output-anchors - Writes anchor-level top-factor assignments to <prefix>.anchors.tsv.
--output-bg-prob-expand - In text mode, writes one line per feature per pixel to include background probabilities. Ignored if --output-original is set.
--use-ticket-system - If set, the order of pixels in the output file is deterministic across runs (though not necessarily the same as the input order). It incurs a performance penalty.
--top-k - Number of top factors to include in the output. Default: 3.
--output-coord-digits - Number of decimal digits to output for coordinates in text mode (used when input coordinates are float or --output-original is not set). Default: 2.
--output-prob-digits - Number of decimal digits to output for probabilities. Default: 4.
--ext-col-ints - Additional integer columns to carry over to the output file. Format: "idx1:name1 idx2:name2 ..." where 'idx' are 0-based column indices. Example: --ext-col-ints 4:celltype 5:cluster.
--ext-col-floats - Additional float columns to carry over to the output file. Format: "idx1:name1 idx2:name2 ..." where 'idx' are 0-based column indices. Example: --ext-col-floats 6:quality.
--ext-col-strs - Additional string columns to carry over to the output file. Format: "idx1:name1:len1 idx2:name2:len2 ..." where 'idx' are 0-based column indices and 'len' are maximum lengths of strings. Example: --ext-col-strs 7:sample_id:20 8:batch:10.
Feature-Specific Modes¶
The two feature-specific modes share the same inference path, where different features are considered separately even if they have the same location. Consider the feature-specific modes only if your factors are closer to "transcriptional modules" than "cell types".
Caution: for Visium HD, where the resolution is 2μm, only --single-feature-pixel is valid.
Compared with the standard pixel mode:
- standard mode jointly considers all observations in the same pixel or voxel and makes one inference result per pixel/voxel
--single-feature-pixelstill collapses the raw data by analysis resolution, but can assign different factor probabilities to features within the same pixel or voxel--single-moleculedoes not collapse the data before inference; each accepted input record is treated on its own
Output differences:
--single-feature-pixelwrites integer pixel or voxel coordinates plus one extrauint32_t featureIdx--single-moleculewrites raw float coordinates plus one extrauint32_t featureIdx
Current restrictions for both --single-feature-pixel and --single-molecule:
- requires
--output-binary - does not support
--output-original - does not support
--output-bg-prob-expand - does not support
--ext-col-ints,--ext-col-floats, or--ext-col-strs - does not support extended or weighted parser modes
3D Modes¶
3D mode needs to be configured explicitly
- Use
--icol-z --thin-3Dfor thin 3D (e.g. 10um tissue slice from imaging-based platforms) - Use
--icol-z --standard-3Dfor real, deep 3D --icol-zwithout one of those flags is rejected- 3D mode does not support external fixed anchors
- standard 3D requires positive
--anchor-distto define the BCC lattice - thin 3D and standard 3D both still use the optional
--half-life-distfor distance-based anchor weighting
Standard 3D¶
Standard 3D uses a body-centered cubic (BCC) BCC anchor lattice in the input coordinate system. This is experimental and is only tested on one DeepSTARmap dataset.
Caution: we assume x, y, and z are already use compatible units, see pts2tiles for options to scale the input coordinates differently in preprocessing.
--anchor-distis the preferred way to define the BCC anchor spacing.- Each pixel is connected to anchors from a fixed local BCC neighborhood: the focal BCC cell plus its 14 face-adjacent neighbors.
--radiusdoes not change that fixed neighborhood, but it does control the anchor weight decay scale.- If
--radiusis omitted, it defaults to1.2 * (2 * anchor-dist / sqrt(3)). - Anchor weights are computed from the actual pixel-to-anchor distance and the configured support radius.
At the CLI level, standard 3D does not require --hex-size,
--hex-grid-dist, or --n-moves. The required geometry control is
--anchor-dist.
If --zmin and --zmax are provided in standard 3D, they are ignored unless
--ignore-outside-zrange is also set.
Example:
punkst pixel-decode --model ${path}/bcc.model.tsv \
--in-tsv ${path}/transcripts.tiled.tsv --in-index ${path}/transcripts.tiled.index \
--temp-dir ${tmpdir} --out-pref ${path}/pixel_3d --output-binary \
--icol-x 0 --icol-y 1 --icol-z 2 --icol-feature 3 --icol-val 4 \
--standard-3D --anchor-dist 12 \
--pixel-res 0.5 --pixel-res-z 0.5 \
--threads ${threads} --seed 1
Thin 3D¶
Thin 3D keeps anchor lattice in x-y and adds a small set of z levels across [zmin, zmax] to distribute these anchors. This is an experimental mode designed for thin tissue slices from imaging-based platforms where the range on the z axis is typically only 5-10μm.
--zminand--zmaxare required.- One of
--thin-3d-z-levelsor--thin-3d-n-z-levelsmust be provided. - If both are provided,
--thin-3d-z-levelstakes precedence and--thin-3d-n-z-levelsis ignored with a warning. - When
--thin-3d-n-z-levelsis used, the z levels are placed evenly betweenzminandzmax. - The z coordinate of each anchor is assigned deterministically but pseudorandomly from its x-y anchor key, so z levels are mixed across x-y without the old periodic stripe pattern.
--radiuscontrols both thin-3D anchor initialization and the final pixel-to-anchor graph.- During initialization, each point contributes to every implicit thin-3D anchor within the 3D support radius, using the same distance-decay rule that is later used during iterative updates.
- The x-y anchor coordinates are still generated from the shifted hex grids.
- Thin-3D anchor retention is therefore radius-based rather than “nearest z planes” based.
Example:
punkst pixel-decode --model ${path}/thin3d.model.tsv \
--in-tsv ${path}/transcripts.tiled.tsv --in-index ${path}/transcripts.tiled.index \
--temp-dir ${tmpdir} --out-pref ${path}/pixel_thin3d --output-binary \
--icol-x 0 --icol-y 1 --icol-z 2 --icol-feature 3 --icol-val 4 \
--thin-3D --hex-grid-dist 18 --n-moves 3 \
--thin-3d-z-levels 0 1.5 3 4.5 6 7.5 9 \
--zmin 0 --zmax 20 --radius 7.5 \
--pixel-res 0.5 --pixel-res-z 1 \
--threads ${threads} --seed 1
Process multiple samples¶
If you want to project the same model onto multiple datasets/samples, you can use --sample-list to pass a tsv file containing all samples' information. (In this case, --in-tsv and --in-index are ignored.)
Note: all other parameters are shared across samples, so the input files should have the same structure.
If your multi-sample data are generated by punkst multisample-prepare, it has created a file named *.persample_file_list.tsv. You can just pass this file to --sample-list and optionally use --out-pref to specify an identifier (e.g., the model information) to add to each output file name.
If you created the input file for each sample manually, you can create a tsv file with at least three columns:
sample_id, path to the transcript file created by pts2tiles, path to the index file created by pts2tiles.
Optional fourth column: output prefix.
Optional fifth column: anchor file path.
If there are headers, all header lines should start with "#".
punkst pixel-decode --model model.tsv --sample-list multi.persample_file_list.tsv \
--temp-dir /tmp --out-pref results --output-binary \
--icol-x 0 --icol-y 1 --icol-feature 2 --icol-val 3 \
--hex-grid-dist 12 --n-moves 2 --threads 8
Output Files¶
Files are written with the prefix specified by --out-pref (or inferred from --out).
Main output¶
<prefix>.bin - Main pixel-level output in binary format when --output-binary is set.
When --single-feature-pixel is enabled, each binary record stores integer pixel or voxel coordinates, featureIdx, and the top-factor pairs.
When --single-molecule is enabled, each binary record stores raw float coordinates, featureIdx, and the top-factor pairs.
<prefix>.tsv - Main pixel-level output in text format when --output-binary is not set.
<prefix>.index - Binary index for the main output. This is written for both text and binary modes. In 2D binary mode, the index describes regular square tiles directly.
<prefix>.anchors.tsv - Anchor-level top-factor assignments when --output-anchors is set.
Summary statistics:¶
<prefix>.pseudobulk.tsv - Gene-by-factor pseudobulk matrix.
<prefix>.confusion.tsv - Factor-by-factor confusion matrix derived from per-pixel assignments.
<prefix>.denoised_pseudobulk.tsv - Denoised version of the pseudobulk matrix.