Preprocess flow data#
In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the .obs
section of the anndata file and perform compensation on the data. Next, we apply different types of normalisation to the data.
import readfcs
import pytometry as pm
%load_ext autoreload
%autoreload 2
Read data from readfcs
package example.
path_data = readfcs.datasets.example()
adata = pm.io.read_fcs(path_data)
adata
AnnData object with n_obs × n_vars = 65016 × 16
var: 'n', 'channel', 'marker', '$PnB', '$PnR', '$PnG'
uns: 'meta'
Reduce features#
We split the data matrix into the marker intensity part and the FSC/SSC part. Moreover, we move all height related features to the .obs
part of the anndata file. Notably. the function split_signal
checks if a feature name is either FSC/SSC or whether a name endswith -A
for area related features and -H
for height related features.
Let us check the var_names
of the features and the channel names. In this example, the channel names have been cleaned such that none of the markers have the -A
or -H
suffix.
adata.var
n | channel | marker | $PnB | $PnR | $PnG | |
---|---|---|---|---|---|---|
FSC-A | 1 | FSC-A | 32 | 262207 | 1 | |
FSC-H | 2 | FSC-H | 32 | 262207 | 1 | |
SSC-A | 3 | SSC-A | 32 | 261588 | 1 | |
KI67 | 4 | B515-A | KI67 | 32 | 261588 | 1 |
CD3 | 5 | R780-A | CD3 | 32 | 261588 | 1 |
CD28 | 6 | R710-A | CD28 | 32 | 261588 | 1 |
CD45RO | 7 | R660-A | CD45RO | 32 | 261588 | 1 |
CD8 | 8 | V800-A | CD8 | 32 | 261588 | 1 |
CD4 | 9 | V655-A | CD4 | 32 | 261588 | 1 |
CD57 | 10 | V585-A | CD57 | 32 | 261588 | 1 |
CD14 | 11 | V450-A | CD14 | 32 | 261588 | 1 |
CCR5 | 12 | G780-A | CCR5 | 32 | 261588 | 1 |
CD19 | 13 | G710-A | CD19 | 32 | 261588 | 1 |
CD27 | 14 | G660-A | CD27 | 32 | 261588 | 1 |
CCR7 | 15 | G610-A | CCR7 | 32 | 261588 | 1 |
CD127 | 16 | G560-A | CD127 | 32 | 261588 | 1 |
We use the channel
column of the adata.var
data frame to split the matrix.
pm.pp.split_signal(adata, var_key="channel")
adata
AnnData object with n_obs × n_vars = 65016 × 13
obs: 'FSC-A', 'FSC-H', 'SSC-A'
var: 'n', 'channel', 'marker', '$PnB', '$PnR', '$PnG', 'signal_type'
uns: 'meta'
The data matrix was reduced by three features (FSC-A
, FSC-H
and SSC-A
).
Compensation#
Next, we compensate the data using the compensation matrix that is included in the FCS file header. Alternatively, one may provide a custom compensation matrix.
The compensate
function matches the var_names
of adata
with the column names of the spillover matrix to compensate the correct channels.
pm.pp.compensate(adata)
5499 NaN values found after compensation. Please adjust compensation matrix.
/home/runner/work/pytometry/pytometry/.nox/build-3-9/lib/python3.9/site-packages/pytometry/preprocessing/_process_data.py:175: RuntimeWarning: overflow encountered in cast
adata.X[:, ref_idx] = X_comp
Normalize data#
In the next step, we normalize the data. By default, normalization is an inplace operation, i.e. we only create a new anndata object, if we set the argument inplace=False
. We demonstrate three different normalization methods that are build in pytometry
:
arcsinh
logicle
bi-exponential
adata_arcsinh = pm.tl.normalize_arcsinh(adata, cofactor=150, inplace=False)
adata_logicle = pm.tl.normalize_logicle(adata, inplace=False)
/home/runner/work/pytometry/pytometry/.nox/build-3-9/lib/python3.9/site-packages/pytometry/tools/_normalization.py:166: RuntimeWarning: invalid value encountered in scalar subtract
y = (ae2bx + p["f"]) - (ce2mdx + value)
adata_biex = pm.tl.normalize_biExp(adata, inplace=False)