API Documentation¶

Models for very easy Bayesian Regression.

class shabadoo.Normal(rng_seed: int = None)[source]¶

Gaussian/normal family model for the generic regression model.

fit(df: pandas.core.frame.DataFrame, sampler: str = 'NUTS', rng_key: jax.numpy.lax_numpy.ndarray = None, sampler_kwargs: Dict[str, Any] = None, **mcmc_kwargs)¶

Fit the model to a DataFrame.

Parameters

df (pd.DataFrame) – Source dataframe.
sampler (str) – Numpyro sampler name. Default NUTS
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.
sampler_kwargs – Passed to the numpyro sampler selected.
**mcmc_kwargs – Passed to numpyro.infer.MCMC

Returns

The fitted model.

Return type

Model

property formula¶: Return a formula string describing the model.

classmethod from_dict(data: Dict[str, Any], **model_kw)¶

Return a pre-fitted model given a dictionary of config.

The dictionary MUST contain the following:

samples. A dictionary of variables to MCMC samples. Must contain all feature

names and additional model variables. Each variable’s data must be the same shape.

Any other dict keys will be added as model attributes.

Parameters

data (dict.) – Model configuration, including requirements listed above.
kwargs – passed to Model() init.

Returns

A ready-to-use model.

Return type

Model

grouped_metrics(df: pandas.core.frame.DataFrame, groupby: Union[str, List[str]], aggfunc: Callable = <function sum>, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶

Return grouped accuracy metrics.

Parameters

df (pd.DataFrame) – Input data for the model.
groupby (str or list of str) – Groupby clause for pandas.
aggfunc (callable) – How to aggregate actuals and predictions wihtin a group. Default sum.
aggerrs (bool) – Option to aggregate errors across groups (default True). If true, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

likelihood_func(yhat)[source]¶: Return a normal likelihood with fitted sigma.

static link(x)[source]¶: Linear link function.

metrics(df: pandas.core.frame.DataFrame, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶

Get prediction accuracy metrics of the model against data.

Parameters

df (pd.DataFrame) – Input data for the model.
aggerrs (bool) – Option to aggregate errors across observations (default True). If true, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

model(df: pandas.core.frame.DataFrame)¶

Define and return samples from the model.

Parameters: df (pd.DataFrame) – Input data for the model.

property num_chains¶

Return the number of chains per variable in the model.

Assumes samples from all variables have same shape.

property num_samples¶

Return the number of samples per variable.

Assumes samples from all variables have same shape. Counts samples across all chains.

predict(df: pandas.core.frame.DataFrame, ci: bool = False, ci_interval: float = 0.9, aggfunc: Union[str, Callable] = 'mean') → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶

Return the average posterior prediction across all samples.

Parameters

df (pd.DataFrame) – Source dataframe.
ci (float) – Option to include a credible interval around the predictions. Returns a dataframe if true, a series if false. Default False.
ci_interval (float) – Credible interval width. Default 0.9.
aggfunc (string or callable) – Aggregation function called over predictions across posterior samples. Applies only to the point prediction (not the CI).

Returns

Forecasts. Will be a series with the name of the dv if no ci. Will be a dataframe if ci is included.

Return type

pd.Series or pd.DataFrame

classmethod preprocess_config_dict(config: dict) → dict¶: Run checks and transformations on dicts for use in from_dict().

sample_posterior_predictive(df: pandas.core.frame.DataFrame, hdpi: bool = False, hdpi_interval: float = 0.9, rng_key: jax.numpy.lax_numpy.ndarray = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶

Obtain samples from the posterior predictive.

Parameters

df (pd.DataFrame) – Source dataframe.
hdpi (bool) – Option to include lower/upper bound of the highest posterior density interval. Returns a dataframe if true, a series if false. Default False.
hdpi_interval (float) – HDPI width. Default 0.9.
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.

Returns

Forecasts. Will be a series with the name of the dv if no HDPI. Will be a dataframe if HDPI is included.

Return type

pd.Series or pd.DataFrame

property samples_df¶: Return a DataFrame of the model’s MCMC samples.

property samples_flat¶: Provide a 1D view of the model’s samples.

split_rand_key(n: int = 1) → jax.random.PRNGKey¶

Split the random key, assign a new key and return the subkeys.

Parameters: n (int) – Number of subkeys to generate. Default 1.
Returns: An array of PRNG keys or just a single key (if n=1).
Return type: random.PRNGKey

to_json() → str¶: Return a JSON payload of the model’s config.

classmethod transform(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶

Transform a dataframe for model input.

Parameters: df (pd.DataFrame) – Source dataframe to transform.
Returns: Dataframe containing transformed inputs.
Return type: pd.DataFrame

class shabadoo.Bernoulli(rng_seed: int = None)[source]¶

Logistic/bernoulli family model, for a binary response variable.

fit(df: pandas.core.frame.DataFrame, sampler: str = 'NUTS', rng_key: jax.numpy.lax_numpy.ndarray = None, sampler_kwargs: Dict[str, Any] = None, **mcmc_kwargs)¶

Fit the model to a DataFrame.

Parameters

df (pd.DataFrame) – Source dataframe.
sampler (str) – Numpyro sampler name. Default NUTS
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.
sampler_kwargs – Passed to the numpyro sampler selected.
**mcmc_kwargs – Passed to numpyro.infer.MCMC

Returns

The fitted model.

Return type

Model

property formula¶: Return a formula string describing the model.

classmethod from_dict(data: Dict[str, Any], **model_kw)¶

Return a pre-fitted model given a dictionary of config.

The dictionary MUST contain the following:

samples. A dictionary of variables to MCMC samples. Must contain all feature

names and additional model variables. Each variable’s data must be the same shape.

Any other dict keys will be added as model attributes.

Parameters

data (dict.) – Model configuration, including requirements listed above.
kwargs – passed to Model() init.

Returns

A ready-to-use model.

Return type

Model

grouped_metrics(df: pandas.core.frame.DataFrame, groupby: Union[str, List[str]], aggfunc: Callable = <function sum>, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶

Return grouped accuracy metrics.

Parameters

df (pd.DataFrame) – Input data for the model.
groupby (str or list of str) – Groupby clause for pandas.
aggfunc (callable) – How to aggregate actuals and predictions wihtin a group. Default sum.
aggerrs (bool) – Option to aggregate errors across groups (default True). If true, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

likelihood_func(probs)[source]¶: Return a Bernoulli likelihood.

static link(x)[source]¶: Logistic link function.

metrics(df: pandas.core.frame.DataFrame, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶

Get prediction accuracy metrics of the model against data.

Parameters

df (pd.DataFrame) – Input data for the model.
aggerrs (bool) – Option to aggregate errors across observations (default True). If true, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

model(df: pandas.core.frame.DataFrame)¶

Define and return samples from the model.

Parameters: df (pd.DataFrame) – Input data for the model.

property num_chains¶

Return the number of chains per variable in the model.

Assumes samples from all variables have same shape.

property num_samples¶

Return the number of samples per variable.

Assumes samples from all variables have same shape. Counts samples across all chains.

predict(df: pandas.core.frame.DataFrame, ci: bool = False, ci_interval: float = 0.9, aggfunc: Union[str, Callable] = 'mean') → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶

Return the average posterior prediction across all samples.

Parameters

df (pd.DataFrame) – Source dataframe.
ci (float) – Option to include a credible interval around the predictions. Returns a dataframe if true, a series if false. Default False.
ci_interval (float) – Credible interval width. Default 0.9.
aggfunc (string or callable) – Aggregation function called over predictions across posterior samples. Applies only to the point prediction (not the CI).

Returns

Forecasts. Will be a series with the name of the dv if no ci. Will be a dataframe if ci is included.

Return type

pd.Series or pd.DataFrame

classmethod preprocess_config_dict(config: dict) → dict¶: Run checks and transformations on dicts for use in from_dict().

sample_posterior_predictive(df: pandas.core.frame.DataFrame, hdpi: bool = False, hdpi_interval: float = 0.9, rng_key: jax.numpy.lax_numpy.ndarray = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶

Obtain samples from the posterior predictive.

Parameters

df (pd.DataFrame) – Source dataframe.
hdpi (bool) – Option to include lower/upper bound of the highest posterior density interval. Returns a dataframe if true, a series if false. Default False.
hdpi_interval (float) – HDPI width. Default 0.9.
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.

Returns

Forecasts. Will be a series with the name of the dv if no HDPI. Will be a dataframe if HDPI is included.

Return type

pd.Series or pd.DataFrame

property samples_df¶: Return a DataFrame of the model’s MCMC samples.

property samples_flat¶: Provide a 1D view of the model’s samples.

split_rand_key(n: int = 1) → jax.random.PRNGKey¶

Split the random key, assign a new key and return the subkeys.

Parameters: n (int) – Number of subkeys to generate. Default 1.
Returns: An array of PRNG keys or just a single key (if n=1).
Return type: random.PRNGKey

to_json() → str¶: Return a JSON payload of the model’s config.

classmethod transform(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶

Transform a dataframe for model input.

Parameters: df (pd.DataFrame) – Source dataframe to transform.
Returns: Dataframe containing transformed inputs.
Return type: pd.DataFrame

class shabadoo.Poisson(rng_seed: int = None)[source]¶

Exponential/poisson family model for rate data.

fit(df: pandas.core.frame.DataFrame, sampler: str = 'NUTS', rng_key: jax.numpy.lax_numpy.ndarray = None, sampler_kwargs: Dict[str, Any] = None, **mcmc_kwargs)¶

Fit the model to a DataFrame.

Parameters

df (pd.DataFrame) – Source dataframe.
sampler (str) – Numpyro sampler name. Default NUTS
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.
sampler_kwargs – Passed to the numpyro sampler selected.
**mcmc_kwargs – Passed to numpyro.infer.MCMC

Returns

The fitted model.

Return type

Model

property formula¶: Return a formula string describing the model.

classmethod from_dict(data: Dict[str, Any], **model_kw)¶

Return a pre-fitted model given a dictionary of config.

The dictionary MUST contain the following:

samples. A dictionary of variables to MCMC samples. Must contain all feature

names and additional model variables. Each variable’s data must be the same shape.

Any other dict keys will be added as model attributes.

Parameters

data (dict.) – Model configuration, including requirements listed above.
kwargs – passed to Model() init.

Returns

A ready-to-use model.

Return type

Model

grouped_metrics(df: pandas.core.frame.DataFrame, groupby: Union[str, List[str]], aggfunc: Callable = <function sum>, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶

Return grouped accuracy metrics.

Parameters

df (pd.DataFrame) – Input data for the model.
groupby (str or list of str) – Groupby clause for pandas.
aggfunc (callable) – How to aggregate actuals and predictions wihtin a group. Default sum.
aggerrs (bool) – Option to aggregate errors across groups (default True). If true, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

likelihood_func(yhat)[source]¶: Return a poisson likelihood.

static link(x)[source]¶: Exponential link function.

metrics(df: pandas.core.frame.DataFrame, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶

Get prediction accuracy metrics of the model against data.

Parameters

df (pd.DataFrame) – Input data for the model.
aggerrs (bool) – Option to aggregate errors across observations (default True). If true, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

model(df: pandas.core.frame.DataFrame)¶

Define and return samples from the model.

Parameters: df (pd.DataFrame) – Input data for the model.

property num_chains¶

Return the number of chains per variable in the model.

Assumes samples from all variables have same shape.

property num_samples¶

Return the number of samples per variable.

Assumes samples from all variables have same shape. Counts samples across all chains.

predict(df: pandas.core.frame.DataFrame, ci: bool = False, ci_interval: float = 0.9, aggfunc: Union[str, Callable] = 'mean') → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶

Return the average posterior prediction across all samples.

Parameters

df (pd.DataFrame) – Source dataframe.
ci (float) – Option to include a credible interval around the predictions. Returns a dataframe if true, a series if false. Default False.
ci_interval (float) – Credible interval width. Default 0.9.
aggfunc (string or callable) – Aggregation function called over predictions across posterior samples. Applies only to the point prediction (not the CI).

Returns

Forecasts. Will be a series with the name of the dv if no ci. Will be a dataframe if ci is included.

Return type

pd.Series or pd.DataFrame

classmethod preprocess_config_dict(config: dict) → dict¶: Run checks and transformations on dicts for use in from_dict().

sample_posterior_predictive(df: pandas.core.frame.DataFrame, hdpi: bool = False, hdpi_interval: float = 0.9, rng_key: jax.numpy.lax_numpy.ndarray = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶

Obtain samples from the posterior predictive.

Parameters

df (pd.DataFrame) – Source dataframe.
hdpi (bool) – Option to include lower/upper bound of the highest posterior density interval. Returns a dataframe if true, a series if false. Default False.
hdpi_interval (float) – HDPI width. Default 0.9.
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.

Returns

Forecasts. Will be a series with the name of the dv if no HDPI. Will be a dataframe if HDPI is included.

Return type

pd.Series or pd.DataFrame

property samples_df¶: Return a DataFrame of the model’s MCMC samples.

property samples_flat¶: Provide a 1D view of the model’s samples.

split_rand_key(n: int = 1) → jax.random.PRNGKey¶

Split the random key, assign a new key and return the subkeys.

Parameters: n (int) – Number of subkeys to generate. Default 1.
Returns: An array of PRNG keys or just a single key (if n=1).
Return type: random.PRNGKey

to_json() → str¶: Return a JSON payload of the model’s config.

classmethod transform(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶

Transform a dataframe for model input.

Parameters: df (pd.DataFrame) – Source dataframe to transform.
Returns: Dataframe containing transformed inputs.
Return type: pd.DataFrame

API Documentation¶

Previous topic

Next topic

This Page