Hubmap: scRNA-seq¶
The HubMAP (Human BioMolecular Atlas Program) consortium is an initiative mapping human cells to create a comprehensive atlas, with its Data Portal serving as the platform where researchers can access, visualize, and download (single-cell) tissue data.
Lamin mirrors most of the datasets for simplified access here: laminlabs/hubmap.
If you use the data academically, please cite the original publication Jain et al. 2023.
Here, we show how the HubMAP instance is structured and how datasets and be queried and accessed.
HubMAP associates several data products, which are the single raw datasets, into higher level datasets. For example, the dataset HBM983.LKMP.544 has three data products:
The laminlabs/hubmap instance registers these data products as ln.Artifact
that jointly form a ln.Collection
.
Connect to the source instance:
# pip install 'lamindb[jupyter,bionty,wetlab]'
!lamin connect laminlabs/hubmap
Show code cell output
→ connected lamindb: laminlabs/hubmap
Note
If you want to transfer artifacts or metadata into your own instance, use .using("laminlabs/hubmap")
when accessing registries and then .save()
(Transfer data).
import lamindb as ln
Show code cell output
→ connected lamindb: laminlabs/hubmap
Getting HubMAP datasets and data products¶
The key
attribute of ln.Artifact
and ln.Collection
corresponds to the IDs of the URLs.
For example, the id in the URL https://portal.hubmapconsortium.org/browse/dataset/20ee458e5ee361717b68ca72caf6044e is the key
of the corresponding collection:
small_intenstine_collection = ln.Collection.get(key="20ee458e5ee361717b68ca72caf6044e")
small_intenstine_collection
Show code cell output
Collection(uid='QjQSiso1qPlnX6iX0000', is_latest=True, key='20ee458e5ee361717b68ca72caf6044e', description='RNAseq data from the small intestine of a 67.0-year-old white female', hash='jF6aG3Nd4qQHBvY8v8Q8dg', space_id=1, created_by_id=3, run_id=11, created_at=2025-01-28 14:17:01 UTC)
We can get all associated data products like:
small_intenstine_collection.artifacts.all().df()
Show code cell output
! no run & transform got linked, call `ln.track()` & re-run
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
28 | AzqCWQAKLMV3iTMA0000 | f6eb890063d13698feb11d39fa61e45a/raw_expr.h5ad | RNAseq data from the small intestine of a 67.0... | .h5ad | None | AnnData | 67867992 | of_TeLP6cet2JBj3o_kZmQ | None | 6000 | md5-etag | False | False | 1 | 2 | None | None | True | 11 | 2025-01-28 14:16:35.355582+00:00 | 3 | None | 1 |
29 | fWN781TxuZibkBOR0000 | f6eb890063d13698feb11d39fa61e45a/secondary_ana... | RNAseq data from the small intestine of a 67.0... | .h5ad | None | AnnData | 888111371 | ian3P5CN68AAvoDMC6sZLw | None | 5956 | md5-etag | False | False | 1 | 2 | None | None | True | 11 | 2025-01-28 14:16:39.348589+00:00 | 3 | None | 1 |
30 | enXVzwjw4voS8UCb0000 | f6eb890063d13698feb11d39fa61e45a/expr.h5ad | RNAseq data from the small intestine of a 67.0... | .h5ad | None | AnnData | 139737320 | kR476u81gwXI6rEbXzNBvQ | None | 6000 | md5-etag | False | False | 1 | 2 | None | None | True | 11 | 2025-01-28 14:16:43.385980+00:00 | 3 | None | 1 |
Note the key of these three Artifacts
which corresponds to the assets URL.
For example, https://assets.hubmapconsortium.org/f6eb890063d13698feb11d39fa61e45a/expr.h5ad is the direct URL to the expr.h5ad
data product.
Artifacts can be directly loaded:
small_intenstine_af = (
small_intenstine_collection.artifacts.filter(key__icontains="raw_expr.h5ad")
.distinct()
.one()
)
adata = small_intenstine_af.load()
adata
Show code cell output
AnnData object with n_obs × n_vars = 6000 × 98000
var: 'hugo_symbol'
Querying single-cell datasets¶
Currently, only the Artifacts
of the raw_expr.h5ad
data products are labeled with metadata.
The available metadata includes ln.Reference
, bt.Tissue
, bt.Disease
, bt.ExperimentalFactor
, and many more.
Please have a look at the instance for more details.
# Get one dataset with a specific type of heart failure
heart_failure_adata = (
ln.Artifact.filter(diseases__name="heart failure with reduced ejection fraction")
.first()
.load()
)
heart_failure_adata
Show code cell output
AnnData object with n_obs × n_vars = 52534 × 60286
obs: 'cell_id'
var: 'hugo_symbol'
layers: 'spliced', 'spliced_unspliced_sum', 'unspliced'