pts2tiles¶
Group points into square tiles for faster downstream access¶
pts2tiles rewrites a delimited text point file so records are grouped into non-overlapping square tiles in (x, y) space. It writes a tiled .tsv output file plus a binary index storing byte offsets for each tile to help faster region query.
The row order is preserved only at the tile level. Records inside a tile may be arbitrary.
Example usage:
punkst pts2tiles \
--in-tsv ${path}/transcripts.tsv \
--icol-x 0 --icol-y 1 --icol-feature 2 --icol-int 3 \
--tile-size 500 \
--temp-dir ${tmpdir} \
--out-prefix ${path}/transcripts.tiled \
--threads ${threads}
Example with per-axis scaling:
punkst pts2tiles \
--in-tsv ${path}/transcripts.tsv \
--icol-x 0 --icol-y 1 --icol-z 2 \
--tile-size 500 \
--scale-x 0.108 --scale-y 0.108 --scale-z 1 \
--temp-dir ${tmpdir} \
--out-prefix ${path}/transcripts.tiled
Example with CSV input:
punkst pts2tiles \
--in-tsv ${path}/transcripts.csv.gz \
--csv \
--icol-x 0 --icol-y 1 --icol-feature 2 --icol-int 3 \
--tile-size 500 \
--temp-dir ${tmpdir} \
--out-prefix ${path}/transcripts.tiled
Accepted Input Forms¶
pts2tiles currently accepts these input forms through --in-tsv:
- A plain text file on disk.
- A gzipped text file whose path ends with
.gz. - Standard input, using
--in-tsv -.
Delimiter handling:
- By default, input is parsed as tab-delimited text.
- With
--csv, input is parsed as comma-delimited text. - Output is always written as tab-delimited
.tsv, even when the input is CSV.
Current parser behavior:
- The parser splits each line on a single-character delimiter only.
- It does not implement full RFC-style CSV quoting or escaping.
- In practice,
--csvshould be used only for simple comma-delimited records where commas do not appear inside quoted fields.
Header and metadata handling:
Lines starting with # are automatically treated as metadata/header and copied directly to the output regardless of --skip, the following options are mainly for lines that do not start with # but should be treated as metadata/header.
--skip Nskips the firstNlines from parsing.- Skipped lines are copied to the output as metadata comments.
--skip-last-is-headermarks the last skipped line as the header row and rewrites it to start with a single#.- Earlier skipped lines are normalized to start with at least two
#characters.
Main Parameters¶
--in-tsv- Input text file, gzipped text file, or
-for stdin. --icol-x,--icol-y- Zero-based column indices for the
xandycoordinates used for tiling. --tile-size- Side length of each square tile, in the same units as the effective coordinates after optional scaling.
--out-prefix- Prefix for output files.
--temp-dir- Directory used for temporary per-tile files during processing.
Feature and Value Columns¶
--icol-feature- Optional zero-based feature column. If provided,
pts2tilescan also writeprefix.features.tsv. --icol-int- Optional zero-based integer column to aggregate per feature. Multiple columns can be provided as
--icol-int 3 4 5. If not provided, a dummy value of1is used assuming each record is one transcript.
Behavior of prefix.features.tsv:
- If
--icol-featureand one or more--icol-intcolumns are provided, each row contains the feature followed by aggregated integer totals for those columns.
Optional Parameters¶
--icol-z- Optional zero-based
zcoordinate column. Tiling still uses onlyxandy, but ifzis provided the command also recordszmin/zmaxand writesprefix.z_hist.tsv. --csv- Parse the input as comma-delimited text instead of tab-delimited text.
--skip- Number of initial lines to skip from parsing and preserve as metadata in the output. Default:
0. --skip-last-is-header- Treat the last skipped line as the header line. Requires
--skip > 0. --tile-buffer- Per-thread, per-tile in-memory buffer size in number of records before flushing to disk. Default:
1000. --batch-size- Batch size in number of lines when reading from stdin or a gzipped input stream. Only relevant for
--in-tsv -and inputs ending in.gz. Default:10000. --scale- Uniform scaling factor applied to
x,y, andzif present, unless axis-specific overrides are supplied. Default:1. --scale-x,--scale-y,--scale-z- Axis-specific scaling factors. If an axis-specific scale is omitted, it falls back to
--scale. --digits- Decimal precision used when rewritten scaled coordinates are written to the output. Default:
2. --threads- Number of worker threads. Default:
1. --verbose- Verbosity level for logging.
--debug- Enable additional debug output.
Scaling Notes¶
- Tile assignment is computed from scaled coordinates.
- If no scaling is requested, coordinates are copied through unchanged.
- If scaling is requested, rewritten coordinate values are emitted using
--digitsdecimal places. --scale-zhas an effect only when--icol-zis provided.
Output Files¶
prefix.tsv: tiled tab-delimited output records.prefix.index: binary tile index with per-tile byte offsets and bounding boxes.prefix.coord_range.tsv: global coordinate range containingxmin,xmax,ymin,ymax, and alsozmin,zmaxif--icol-zis provided.prefix.z_hist.tsv: two-column tab-delimited histogram ofzusing unit-width bins. The unit is the same as the effectivezafter scaling. Written only when--icol-zis provided.prefix.features.tsv: sorted feature summary file. Written only when--icol-featureis provided.