lamindb.curators.DataFrameCatManager¶
- class lamindb.curators.DataFrameCatManager(df, columns=FieldAttr(Feature.name), categoricals=None, verbosity='hint', organism=None, sources=None, exclude=None)¶
- Bases: - CatManager- Curation flow for a DataFrame object. - See also - Curator.- Parameters:
- df ( - DataFrame|- Artifact) – The DataFrame object to curate.
- columns ( - DeferredAttribute, default:- FieldAttr(Feature.name)) – The field attribute for the feature column.
- categoricals ( - dict[- str,- DeferredAttribute] |- None, default:- None) – A dictionary mapping column names to registry_field.
- verbosity ( - str, default:- 'hint') – The verbosity level.
- organism ( - str|- None, default:- None) – The organism name.
- sources ( - dict[- str,- Record] |- None, default:- None) – A dictionary mapping column names to Source records.
- exclude ( - dict|- None, default:- None) – A dictionary mapping column names to values to exclude from validation. When specific- Sourceinstances are pinned and may lack default values (e.g., “unknown” or “na”), using the exclude parameter ensures they are not validated.
 
- Returns:
- A curator object. 
 - Examples - >>> import bionty as bt >>> curator = ln.Curator.from_df( ... df, ... categoricals={ ... "cell_type_ontology_id": bt.CellType.ontology_id, ... "donor_id": ULabel.name ... } ... ) - Attributes¶- property categoricals: dict¶
- Return the columns fields to validate against. 
 - property non_validated: dict[str, list[str]]¶
- Return the non-validated features and labels. 
 - Class methods¶- classmethod from_anndata(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), verbosity='hint', organism=None, sources=None)¶
- Return type:
- AnnDataCatManager 
 
 - classmethod from_df(df, categoricals=None, columns=FieldAttr(Feature.name), verbosity='hint', organism=None)¶
- Return type:
 
 - classmethod from_mudata(mdata, var_index, categoricals=None, verbosity='hint', organism=None)¶
- Return type:
- MuDataCatManager 
 
 - classmethod from_spatialdata(sdata, var_index, categoricals=None, organism=None, sources=None, exclude=None, verbosity='hint', *, sample_metadata_key='sample')¶
 - classmethod from_tiledbsoma(experiment_uri, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), organism=None, sources=None, exclude=None)¶
- Return type:
 
 - Methods¶- add_new_from(key, **kwargs)¶
- Add validated & new categories. - Parameters:
- key ( - str) – The key referencing the slot in the DataFrame from which to draw terms.
- organism – The organism name. 
- **kwargs – Additional keyword arguments to pass to create new records 
 
 
 - add_new_from_columns(organism=None, **kwargs)¶
 - clean_up_failed_runs()¶
- Clean up previous failed runs that don’t save any outputs. 
 - lookup(public=False)¶
- Lookup categories. - Parameters:
- public ( - bool, default:- False) – If “public”, the lookup is performed on the public reference.
- Return type:
 
 - save_artifact(*, key=None, description=None, revises=None, run=None)¶
- Save an annotated artifact. - Parameters:
- key ( - str|- None, default:- None) – A path-like key to reference artifact in default storage, e.g.,- "myfolder/myfile.fcs". Artifacts with the same key form a version family.
- description ( - str|- None, default:- None) – A description.
- revises ( - Artifact|- None, default:- None) – Previous version of the artifact. Is an alternative way to passing- keyto trigger a new version.
- run ( - Run|- None, default:- None) – The run that creates the artifact.
 
- Return type:
- Returns:
- A saved artifact record. 
 
 - standardize(key)¶
- Replace synonyms with standardized values. - Modifies the input dataset inplace. - Parameters:
- key ( - str) – The key referencing the column in the DataFrame to standardize.
- Return type:
- None
 
 - validate()¶
- Validate variables and categorical observations. - This method also registers the validated records in the current instance: - from public sources - Parameters:
- organism – The organism name. 
- Return type:
- bool
- Returns:
- Whether the DataFrame is validated.