dratio.models.Dataset#
- class dratio.models.Dataset(client, code: str, version: str | None = None)#
Representation of a dataset in the database. This class allows to obtain information about the dataset and its versions and download as a pandas or geopandas dataframe.
- Parameters:
code (str) – Unique identifier of the feature in the database.
version (str | None) – Version of the dataset to be used. If None, the latest version is used.
client (Client) – Client object used to perform requests to the database.
**kwargs – Additional keyword arguments used to initialize the metadata information.
Examples
Retrieve a dataset from the dratio.io marketplace:
>>> from dratio import Client >>> client = Client('YOUR_API_KEY') >>> dataset = client.get('municipalities') >>> dataset Dataset('municipalities')
Access fields included in the metadata of the dataset:
>>> dataset.name 'Municipalities' >>> dataset.description 'Municipalities of Spain according to the name under which they are registered ...'
Get a dictionary with all metadata:
>>> dataset.metadata {'code': 'municipalities', 'name': 'Municipalities', 'description': ...}
Get current version of the dataset
>>> dataset.version Version('municipalities-v1')
Download a dataset as a pandas dataframe:
>>> df = dataset.to_pandas()
Download as a geopandas dataframe (for geospatial datasets):
>>> gdf = dataset.to_geopandas()
Methods
__init__
(client, code[, version])Initializes the Dataset object
add_feature
(feature)Adds a feature to the dataset.
delete
()Deletes the object from the database.
describe
()Returns a string representation of the object's metadata.
fetch
([fail_not_found])Updates the metadata dictionary of the object by performing an HTTP request to the server.
from_dict
(metadata)Updates the internal state of the object with the provided metadata.
keys
()Returns the keys of the metadata dictionary.
list_features
([format])Returns the features associated to the object.
list_files
([filetype, format])Returns a list of files associated to the version.
list_versions
([format])List available versions of the dataset
metadata_from_pandas
(df, publisher[, ...])Automatically generates the metadata of the dataset from a pandas dataframe.
save
()Saves the object's metadata to the database.
set_version
(version)to_geopandas
([cross_strategy])Downloads the dataset as a geopandas geodataframe.
Downloads the dataset as a pandas dataframe.
upload_file
(file[, filetype, update])Upload a file to the dataset.
Attributes
Returns the categories associated to the object.
Return a list with all the columns of the dataset (List[str], read-only).
Returns the description of the object.
Dictionary with features indexed by column name (Dict[str, Feature], read-only).
Granularity of the dataset, i.e., the time between different timestamps points (str, read-only).
Last date of the dataset (str, read-only).
Last update of the dataset (str, read-only).
Level of the dataset (dict, read-only).
License of the dataset (str, read-only).
Retrieves the metadata associated with the object.
Number of features in the dataset (int, read-only).
Number of time slices in the dataset (int, read-only).
Number of values in the dataset (int, read-only).
Number of variables in the dataset (int, read-only).
Returns the name of the object.
Next scheduled update of the dataset (str, read-only).
Name of the publisher of the dataset (str, read-only).
Scope of the dataset (dict, read-only).
Start date of the dataset (str, read-only).
Name of the column used as timestamp (str, read-only).
Update frequency of the dataset (str, read-only).
Return the current version of the dataset (Version, read-only).
- add_feature(feature: Feature) None #
Adds a feature to the dataset.
- Parameters:
feature (Feature) – Feature to add to the dataset.
Examples
>>>
- Raises:
requests.exceptions.RequestException. – If the request fails due to an HTTP or Conection Error.
- property columns: List[str]#
Return a list with all the columns of the dataset (List[str], read-only).
- delete() None #
Deletes the object from the database.
- Raises:
requests.exceptions.RequestException – If the request fails.
- property features: List[Feature]#
Dictionary with features indexed by column name (Dict[str, Feature], read-only).
- fetch(fail_not_found: bool = True) DatabaseResource #
Updates the metadata dictionary of the object by performing an HTTP request to the server.
- Returns:
self (DatabaseResource) – The object itself.
fail_not_found (bool, default True) – Whether to raise an exception if the object is not found in the database.
Notes
This method modifies the object’s internal state.
- Raises:
requests.exceptions.RequestException – If the request fails.
ObjectNotFound – If the object is not found in the database.
- from_dict(metadata)#
Updates the internal state of the object with the provided metadata.
- Parameters:
metadata (dict) – Dictionary containing the metadata of the object.
- Returns:
self – The object itself.
- Return type:
DatabaseResource
Notes
This method modifies the object’s internal state.
- property granularity: str | None#
Granularity of the dataset, i.e., the time between different timestamps points (str, read-only).
- list_features(format: Literal['pandas', 'json', 'api'] = 'pandas') pd.DataFrame | List[Dict[str, Any]] | List[Feature] #
Returns the features associated to the object.
- Parameters:
format (str, optional) – Format of the output. Either “pandas”, “json” or “api”. Defaults to “pandas”. If “pandas”, the output is a pandas DataFrame. If “json”, the output is a list of dictionaries. If “api”, the output is a list of Feature objects.
- Returns:
List of features associated to the object.
- Return type:
Union[“pd.DataFrame”, List[Dict[str, Any]], List[“Feature”]]
Examples
List all features available in the database:
>>> from dratio import Client >>> client = Client("Your API key") >>> client.list_features()
List all features associated to the publisher “ine” (National Institute of Statistics):
>>> publisher = client.get_publisher("ine") >>> publisher.list_features()
List all features of a dataset (its columns):
>>> dataset = client.get_dataset("municipalities") >>> dataset.list_features()
List all features availabe at census level:
>>> level = client.get("census", kind="data-level") >>> level.list_features()
- Raises:
ValueError – If the format is not “pandas”, “json” or “api”.
HTTPError – If the request to the API fails.
DratioException: – If the response from the API is not valid (e.g an invalid api key or insufficient permissions).
- list_files(filetype: Literal['parquet', 'geoparquet'] | None = None, format: Literal['pandas', 'json', 'api'] = 'pandas') pd.DataFrame | List[Dict[str, Any]] | List[File] #
Returns a list of files associated to the version.
- Parameters:
filetype (Optional[Literal["parquet", "geoparquet"]]) – Type of file to filter. If None, all files are returned.
format (Literal["pandas", "json"]) – Format of the returned list, either a list of dictionaries or a pandas DataFrame.
- Returns:
List of files associated to the version.
- Return type:
Literal[“pandas”, “json”]
- list_versions(format: Literal['pandas', 'json', 'api'] = 'pandas') pd.DataFrame | List[Dict[str, Any]] | List[Version] #
List available versions of the dataset
- Returns:
List of features.
- Return type:
List[Feature]
Examples
>>>
- Raises:
requests.exceptions.RequestException. – If the request fails due to an HTTP or Conection Error.
- property metadata: Dict[str, Any]#
Retrieves the metadata associated with the object.
Notes
The first time this property is accessed, a request is made to the server to fetch the metadata. Subsequent accesses return the previously loaded information. To update the metadata, create a new instance of the object.
- metadata_from_pandas(df: pd.DataFrame | gpd.GeoDataFrame, publisher: str | Publisher, license: str | License | None = None, timestamp_column: str = 'timestamp') Dataset #
Automatically generates the metadata of the dataset from a pandas dataframe. This method is useful to create a dataset from a pandas dataframe, and is intended to be used for data providers that want to upload their data to dratio.io.
- Parameters:
df (Union[pandas.DataFrame, geopandas.GeoDataFrame]) – Pandas dataframe with the data.
publisher (Union[str, Publisher]) – Publisher of the dataset.
license (Optional[Union[str, License]]) – License of the dataset.
timestamp_column (str) – Name of the column used as timestamp (if applicable).
- Returns:
Dataset object with the metadata generated from the pandas dataframe.
- Return type:
- save() Dataset #
Saves the object’s metadata to the database.
- Returns:
self – The object itself.
- Return type:
DatabaseResource
- Raises:
requests.exceptions.RequestException – If the request fails.
- to_geopandas(cross_strategy: str = 'auto') gpd.GeoDataFrame #
Downloads the dataset as a geopandas geodataframe.
- Returns:
GeoDataFrame with the dataset.
- Return type:
Notes
This method requires the geopandas library to be installed.
- Raises:
ImportError. – If the geopandas library is not installed. You can install it using pip install dratio[geo].
requests.exceptions.RequestException. – If the request fails due to an HTTP or Conection Error.
- to_pandas() pd.DataFrame #
Downloads the dataset as a pandas dataframe.
- Returns:
Dataframe with the dataset.
- Return type:
Examples
>>>
- Raises:
requests.exceptions.RequestException. – If the request fails due to an HTTP or Conection Error.