
class lamindb.Artifact(data: UPathStr, kind: ArtifactKind | None = None, key: str | None = None, description: str | None = None, revises: Artifact | None = None, run: Run | None = None)

Bases: Record, IsVersioned, TracksRun, TracksUpdates

Datasets & models stored as files, folders, or arrays.

Artifacts manage data in local or remote storage.

Some artifacts are array-like, e.g., when stored as .parquet, .h5ad, .zarr, or .tiledb.

  • dataUPathStr A path to a local or remote folder or file.

  • kindLiteral["dataset", "model"] | None = None Distinguish models from datasets from other files & folders.

  • keystr | None = None A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a version family.

  • descriptionstr | None = None A description.

  • revisesArtifact | None = None Previous version of the artifact. Is an alternative way to passing key to trigger a new version.

  • runRun | None = None The run that creates the artifact.

Typical storage formats & their API accessors


  • Table: .csv, .tsv, .parquet, .ipcDataFrame, pyarrow.Table

  • Annotated matrix: .h5ad, .h5mu, .zradAnnData, MuData

  • Generic array: HDF5 group, zarr group, TileDB store ⟷ HDF5, zarr, TileDB loaders


  • Image: .jpg, .pngnp.ndarray, …

  • Fastq: .fastq ⟷ /

  • VCF: .vcf ⟷ /

  • QC: .html ⟷ /

You’ll find these values in the suffix & accessor fields.

LaminDB makes some default choices (e.g., serialize a DataFrame as a .parquet file).

Create an artifact from a DataFrame.


Create an artifact from an AnnData.


Create an artifact by passing key:

>>> artifact = ln.Artifact("./my_file.parquet", key="example_datasets/my_file.parquet").save()
>>> artifact = ln.Artifact("./my_folder", key="project1/my_folder").save()

Calling .save() uploads the file to the default storage location of your lamindb instance. (If it’s a local instance, the “upload” is a mere copy operation.)

If your artifact is already in the cloud, lamindb auto-populates the key field based on the S3 key and there is no upload:

>>> artifact = ln.Artifact("s3://my_bucket/my_folder/my_file.csv").save()

You can make a new version of the artifact with key = "example_datasets/my_file.parquet"

>>> artifact_v2 = ln.Artifact("./my_file.parquet", key="example_datasets/my_file.parquet").save()
>>> artifact_v2.versions.df()  # see all versions
Why does the API look this way?

It’s inspired by APIs building on AWS S3.

Both boto3 and quilt select a bucket (a storage location in LaminDB) and define a target path through a key argument.

In boto3:

# signature: S3.Bucket.upload_file(filepath, key)
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
bucket.upload_file('/tmp/hello.txt', 'hello.txt')

In quilt3:

# signature: quilt3.Bucket.put_file(key, filepath)
import quilt3
bucket = quilt3.Bucket('mybucket')
bucket.put_file('hello.txt', '/tmp/hello.txt')

Sometimes you want to avoid mapping the artifact into a file hierarchy, and you can then _just_ populate description instead:

>>> artifact = ln.Artifact("s3://my_bucket/my_folder", description="My folder").save()
>>> artifact = ln.Artifact("./my_local_folder", description="My local folder").save()

Because you can then not use key-based versioning you have to pass revises to make a new artifact version:

>>> artifact_v2 = ln.Artifact("./my_file.parquet", revises=old_artifact).save()

If an artifact with the exact same hash already exists, Artifact() returns the existing artifact. In concurrent workloads where the same artifact is created multiple times, Artifact() doesn’t yet return the existing artifact but creates a new one; .save() however detects the duplication and will return the existing artifact.


features: FeatureManager

Feature manager.

Features denote dataset dimensions, i.e., the variables that measure labels & numbers.

Annotate with features & values:

     "species": organism,  # here, organism is an Organism record
     "scientist": ['Barbara McClintock', 'Edgar Anderson'],
     "temperature": 27.6,
     "study": "Candidate marker study"

Query for features & values:

ln.Artifact.features.filter(scientist="Barbara McClintock")

Features may or may not be part of the artifact content in storage. For instance, the Curator flow validates the columns of a DataFrame-like artifact and annotates it with features corresponding to these columns. artifact.features.add_values, by contrast, does not validate the content of the artifact.

property labels: LabelManager

Label manager.

To annotate with labels, you typically use the registry-specific accessors, for instance ulabels:

candidate_marker_study = ln.ULabel(name="Candidate marker study").save()

Similarly, you query based on these accessors:

ln.Artifact.filter(ulabels__name="Candidate marker study").all()

Unlike the registry-specific accessors, the .labels accessor provides a way of associating labels with features:

study = ln.Feature(name="study", dtype="cat").save()
artifact.labels.add(candidate_marker_study, feature=study)

Note that the above is equivalent to:

artifact.features.add_values({"study": candidate_marker_study})
params: ParamManager

Param manager.


    "hidden_size": 32,
    "bottleneck_size": 16,
    "batch_size": 32,
    "preprocess_params": {
        "normalization_type": "cool",
        "subset_highlyvariable": True,
property path: Path


File in cloud storage, here AWS S3:

>>> artifact = ln.Artifact("s3://my-bucket/my-file.csv").save()
>>> artifact.path

File in local storage:

>>> ln.Artifact("./myfile.csv", key="myfile.csv").save()
>>> artifact.path
property stem_uid: str

Universal id characterizing the version family.

The full uid of a record is obtained via concatenating the stem uid and version information:

stem_uid = random_base62(n_char)  # a random base62 sequence of length 12 (transform) or 16 (artifact, collection)
version_uid = "0000"  # an auto-incrementing 4-digit base62 number
uid = f"{stem_uid}{version_uid}"  # concatenate the stem_uid & version_uid
property transform: Transform | None

Transform whose run created the artifact.

property versions: QuerySet

Lists all records of the same version family.

>>> new_artifact = ln.Artifact(df2, revises=artifact).save()
>>> new_artifact.versions()

Simple fields

uid: str

A universal random id.

key: str | None

A (virtual) relative file path within the artifact’s storage location.

Setting a key is useful to automatically group artifacts into a version family.

LaminDB defaults to a virtual file path to make renaming of data in object storage easy.

If you register existing files in a storage location, the key equals the actual filepath on the underyling filesytem or object store.

description: str | None

A description.

suffix: str

Path suffix or empty string if no canonical suffix exists.

This is either a file suffix (".csv", ".h5ad", etc.) or the empty string “”.

kind: ArtifactKind | None

ArtifactKind (default None).

otype: str | None

Default Python object type, e.g., DataFrame, AnnData.

size: int | None

Size in bytes.

Examples: 1KB is 1e3 bytes, 1MB is 1e6, 1GB is 1e9, 1TB is 1e12 etc.

hash: str | None

Hash or pseudo-hash of artifact content.

Useful to ascertain integrity and avoid duplication.

n_files: int | None

Number of files for folder-like artifacts, None for file-like artifacts.

Note that some arrays are also stored as folders, e.g., .zarr or .tiledbsoma.

Changed in version 1.0: Renamed from n_objects to n_files.

n_observations: int | None

Number of observations.

Typically, this denotes the first array dimension.

version: str | None

Version (default None).

Defines version of a family of records characterized by the same stem_uid.

Consider using semantic versioning with Python versioning.

is_latest: bool

Boolean flag that indicates whether a record is the latest in its version family.

created_at: datetime

Time of creation of record.

updated_at: datetime

Time of last update to record.

Relational fields

space: Space

The space in which the record lives.

storage: Storage

Storage location, e.g. an S3 or GCP bucket or a local directory.

run: Run | None

Run that created the artifact.

schema: Schema | None

The schema that validated this artifact in a Curator.

created_by: User

Creator of record.

ulabels: ULabel

The ulabels measured in the artifact (ULabel).

input_of_runs: Run

Runs that use this artifact as an input.

feature_sets: Schema

The feature sets measured by the artifact.

collections: Collection

The collections that this artifact is part of.

references: Reference

Linked references.

projects: Project

Linked projects.

Class methods

classmethod df(include=None, features=False, limit=100)

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use arguments include or feature to include other data.

  • include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "ulabels__name", "cell_types__name", etc. or a list of such strings.

  • features (bool | list[str], default: False) – If True, map all features of the Feature registry onto the resulting DataFrame. Only available for Artifact.

  • limit (int, default: 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:



Include the name of the creator in the DataFrame:

>>> ln.ULabel.df(include="created_by__name"])

Include display of features for Artifact:

>>> df = ln.Artifact.df(features=True)
>>> ln.view(df)  # visualize with type annotations

Only include select features:

>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod filter(*queries, **expressions)

Query records.

  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:



A QuerySet.

>>> ln.ULabel(name="my label").save()
>>> ln.ULabel.filter(name__startswith="my").df()
classmethod from_anndata(adata, *, key=None, description=None, run=None, revises=None, **kwargs)

Create from AnnData, validate & link features.

  • adata (AnnData | lamindb.core.types.UPathStr) – An AnnData object or a path of AnnData-like.

  • key (str | None, default: None) – A relative path within default storage, e.g., "myfolder/myfile.h5ad".

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – An old version of the artifact.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:


>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> adata = ln.core.datasets.anndata_with_obs()
>>> artifact = ln.Artifact.from_anndata(adata, description="mini anndata with obs")
classmethod from_df(df, *, key=None, description=None, run=None, revises=None, **kwargs)

Create from DataFrame, validate & link features.

  • df (DataFrame) – A DataFrame object.

  • key (str | None, default: None) – A relative path within default storage, e.g., "myfolder/myfile.parquet".

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – An old version of the artifact.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:


>>> df = ln.core.datasets.df_iris_in_meter_batch1()
>>> df.head()
  sepal_length sepal_width petal_length petal_width iris_organism_code
0        0.051       0.035        0.014       0.002                 0
1        0.049       0.030        0.014       0.002                 0
2        0.047       0.032        0.013       0.002                 0
3        0.046       0.031        0.015       0.002                 0
4        0.050       0.036        0.014       0.002                 0
>>> artifact = ln.Artifact.from_df(df, description="Iris flower collection batch1")
classmethod from_dir(path, *, key=None, run=None)

Create a list of artifact objects from a directory.


If you have a high number of files (several 100k) and don’t want to track them individually, create a single Artifact via Artifact(path) for them. See, e.g., RxRx: cell imaging.

  • path (lamindb.core.types.UPathStr) – Source path of folder.

  • key (str | None, default: None) – Key for storage destination. If None and directory is in a registered location, the inferred key will reflect the relative position. If None and directory is outside of a registered storage location, the inferred key defaults to

  • run (Run | None, default: None) – A Run object.

Return type:



>>> dir_path = ln.core.datasets.generate_cell_ranger_files("sample_001",
>>> artifacts = ln.Artifact.from_dir(dir_path)
classmethod from_mudata(mdata, *, key=None, description=None, run=None, revises=None, **kwargs)

Create from MuData, validate & link features.

  • mdata (MuData | lamindb.core.types.UPathStr) – A MuData object.

  • key (str | None, default: None) – A relative path within default storage, e.g., "myfolder/myfile.h5mu".

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – An old version of the artifact.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:


>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> mdata = ln.core.datasets.mudata_papalexi21_subset()
>>> artifact = ln.Artifact.from_mudata(mdata, description="a mudata object")
classmethod from_spatialdata(sdata, *, key=None, description=None, run=None, revises=None, **kwargs)

Create from SpatialData, validate & link features.

  • mdata – A SpatialData object.

  • key (str | None, default: None) – A relative path within default storage, e.g., "myfolder/myfile.zarr".

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – An old version of the artifact.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:


>>> artifact = ln.Artifact.from_spatialdata(sdata, key="my_dataset.zarr")
classmethod from_tiledbsoma(path, *, key=None, description=None, run=None, revises=None, **kwargs)

Create from a tiledbsoma store.

  • path (lamindb.core.types.UPathStr) – A tiledbsoma store with .tiledbsoma suffix.

  • key (str | None, default: None) – A relative path within default storage, e.g., "myfolder/mystore.tiledbsoma".

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – An old version of the artifact.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:



>>> artifact = ln.Artifact.from_tiledbsoma("s3://mybucket/store.tiledbsoma", description="a tiledbsoma store")
classmethod get(idlike=None, **expressions)

Get a single record.

  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Return type:



A record.


lamindb.errors.DoesNotExist – In case no matching record is found.

>>> ulabel = ln.ULabel.get("FvtpPJLJ")
>>> ulabel = ln.ULabel.get(name="my-label")
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

Return type:



A NamedTuple of lookup information of the field values with a dictionary converter.

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod search(string, *, field=None, limit=20, case_sensitive=False)


  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:



A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
classmethod using(instance)

Use a non-default LaminDB instance.


instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:



>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name")
            uid    score
ULabel7  g7Hk9b2v  100.0
ULabel5  t4Jm6s0q   75.0
ULabel6  r2Xw8p1z   75.0



Download cloud artifact to local cache.

Follows synching logic: only caches an artifact if it’s outdated in the local cache.

Returns a path to a locally cached on-disk object (say a .jpg file).

Return type:



Sync file from cloud and return the local path of the cache:

>>> artifact.cache()
delete(permanent=None, storage=None, using_key=None)

Trash or permanently delete.

A first call to .delete() puts an artifact into the trash (sets _branch_code to -1). A second call permanently deletes the artifact. If it is a folder artifact with multiple versions, deleting a non-latest version will not delete the underlying storage by default (if storage=True is not specified). Deleting the latest version will delete all the versions for folder artifacts.

FAQ: Storage FAQ

  • permanent (bool | None, default: None) – Permanently delete the artifact (skip trash).

  • storage (bool | None, default: None) – Indicate whether you want to delete the artifact in storage.

Return type:



For an Artifact object artifact, call:

>>> artifact = ln.Artifact.filter(key="some.csv").one()
>>> artifact.delete() # delete a single file artifact
>>> artifact = ln.Artifact.filter(key="some.tiledbsoma". is_latest=False).first()
>>> artiact.delete() # delete an old version, the data will not be deleted
>>> artifact = ln.Artifact.filter(key="some.tiledbsoma". is_latest=True).one()
>>> artiact.delete() # delete all versions, the data will be deleted or prompted for deletion.

Describe relations of record.

Return type:



>>> artifact.describe()
load(is_run_input=None, **kwargs)

Cache and load into memory.

Return type:



Load a DataFrame-like artifact:

>>> artifact.load().head()
sepal_length sepal_width petal_length petal_width iris_organism_code
0        0.051       0.035        0.014       0.002                 0
1        0.049       0.030        0.014       0.002                 0
2        0.047       0.032        0.013       0.002                 0
3        0.046       0.031        0.015       0.002                 0
4        0.050       0.036        0.014       0.002                 0

Load an AnnData-like artifact:

>>> artifact.load()
AnnData object with n_obs × n_vars = 70 × 765

Fall back to cache() if no in-memory representation is configured:

>>> artifact.load()
open(mode='r', is_run_input=None, **kwargs)

Return a cloud-backed data object.

Works for AnnData (.h5ad and .zarr), generic hdf5 and zarr, tiledbsoma objects (.tiledbsoma), pyarrow compatible formats.


mode (str, default: 'r') – can only be "w" (write mode) for tiledbsoma stores, otherwise should be always "r" (read-only mode).

Return type:

AnnDataAccessor | BackedAccessor | Collection | Experiment | Measurement | Dataset


Read AnnData in backed mode from cloud:

>>> artifact = ln.Artifact.get(key="lndb-storage/pbmc68k.h5ad")
AnnDataAccessor object with n_obs × n_vars = 70 × 765
    constructed for the AnnData object pbmc68k.h5ad
replace(data, run=None, format=None)

Replace artifact content.

  • data (lamindb.core.types.UPathStr | DataFrame | AnnData | MuData) – A file path.

  • run (Run | None, default: None) – The run that created the artifact gets auto-linked if ln.track() was called.

Return type:



Say we made a change to the content of an artifact, e.g., edited the image paradisi05_laminopathic_nuclei.jpg.

This is how we replace the old file in storage with the new file:

>>> artifact.replace("paradisi05_laminopathic_nuclei.jpg")

Note that this neither changes the storage key nor the filename.

However, it will update the suffix if it changes.


Restore from trash.

Return type:



>>> artifact.restore()
save(upload=None, **kwargs)

Save to database & storage.


upload (bool | None, default: None) – Trigger upload to cloud storage in instances with hybrid storage mode.

Return type:



>>> artifact = ln.Artifact("./myfile.csv", description="myfile")

Graph of data flow.

Return type:



>>> collection.view_lineage()
>>> artifact.view_lineage()