API Documentation

Models for very easy Bayesian Regression.

class shabadoo.Normal(rng_seed: int = None)[source]

Gaussian/normal family model for the generic regression model.

fit(df: pandas.core.frame.DataFrame, sampler: str = 'NUTS', rng_key: jax.numpy.lax_numpy.ndarray = None, sampler_kwargs: Dict[str, Any] = None, **mcmc_kwargs)

Fit the model to a DataFrame.

Parameters
  • df (pd.DataFrame) – Source dataframe.

  • sampler (str) – Numpyro sampler name. Default NUTS

  • rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.

  • sampler_kwargs – Passed to the numpyro sampler selected.

  • **mcmc_kwargs – Passed to numpyro.infer.MCMC

Returns

The fitted model.

Return type

Model

property formula

Return a formula string describing the model.

classmethod from_dict(data: Dict[str, Any], **model_kw)

Return a pre-fitted model given a dictionary of config.

The dictionary MUST contain the following:

  • samples. A dictionary of variables to MCMC samples. Must contain all feature

names and additional model variables. Each variable’s data must be the same shape.

Any other dict keys will be added as model attributes.

Parameters
  • data (dict.) – Model configuration, including requirements listed above.

  • kwargs – passed to Model() init.

Returns

A ready-to-use model.

Return type

Model

grouped_metrics(df: pandas.core.frame.DataFrame, groupby: Union[str, List[str]], aggfunc: Callable = <function sum>, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]

Return grouped accuracy metrics.

Parameters
  • df (pd.DataFrame) – Input data for the model.

  • groupby (str or list of str) – Groupby clause for pandas.

  • aggfunc (callable) – How to aggregate actuals and predictions wihtin a group. Default sum.

  • aggerrs (bool) – Option to aggregate errors across groups (default True). If true, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

likelihood_func(yhat)[source]

Return a normal likelihood with fitted sigma.

Linear link function.

metrics(df: pandas.core.frame.DataFrame, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]

Get prediction accuracy metrics of the model against data.

Parameters
  • df (pd.DataFrame) – Input data for the model.

  • aggerrs (bool) – Option to aggregate errors across observations (default True). If true, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

model(df: pandas.core.frame.DataFrame)

Define and return samples from the model.

Parameters

df (pd.DataFrame) – Input data for the model.

property num_chains

Return the number of chains per variable in the model.

Assumes samples from all variables have same shape.

property num_samples

Return the number of samples per variable.

Assumes samples from all variables have same shape. Counts samples across all chains.

predict(df: pandas.core.frame.DataFrame, ci: bool = False, ci_interval: float = 0.9, aggfunc: Union[str, Callable] = 'mean') → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Return the average posterior prediction across all samples.

Parameters
  • df (pd.DataFrame) – Source dataframe.

  • ci (float) – Option to include a credible interval around the predictions. Returns a dataframe if true, a series if false. Default False.

  • ci_interval (float) – Credible interval width. Default 0.9.

  • aggfunc (string or callable) – Aggregation function called over predictions across posterior samples. Applies only to the point prediction (not the CI).

Returns

Forecasts. Will be a series with the name of the dv if no ci. Will be a dataframe if ci is included.

Return type

pd.Series or pd.DataFrame

classmethod preprocess_config_dict(config: dict) → dict

Run checks and transformations on dicts for use in from_dict().

sample_posterior_predictive(df: pandas.core.frame.DataFrame, hdpi: bool = False, hdpi_interval: float = 0.9, rng_key: jax.numpy.lax_numpy.ndarray = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Obtain samples from the posterior predictive.

Parameters
  • df (pd.DataFrame) – Source dataframe.

  • hdpi (bool) – Option to include lower/upper bound of the highest posterior density interval. Returns a dataframe if true, a series if false. Default False.

  • hdpi_interval (float) – HDPI width. Default 0.9.

  • rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.

Returns

Forecasts. Will be a series with the name of the dv if no HDPI. Will be a dataframe if HDPI is included.

Return type

pd.Series or pd.DataFrame

property samples_df

Return a DataFrame of the model’s MCMC samples.

property samples_flat

Provide a 1D view of the model’s samples.

split_rand_key(n: int = 1) → jax.random.PRNGKey

Split the random key, assign a new key and return the subkeys.

Parameters

n (int) – Number of subkeys to generate. Default 1.

Returns

An array of PRNG keys or just a single key (if n=1).

Return type

random.PRNGKey

to_json() → str

Return a JSON payload of the model’s config.

classmethod transform(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Transform a dataframe for model input.

Parameters

df (pd.DataFrame) – Source dataframe to transform.

Returns

Dataframe containing transformed inputs.

Return type

pd.DataFrame

class shabadoo.Bernoulli(rng_seed: int = None)[source]

Logistic/bernoulli family model, for a binary response variable.

fit(df: pandas.core.frame.DataFrame, sampler: str = 'NUTS', rng_key: jax.numpy.lax_numpy.ndarray = None, sampler_kwargs: Dict[str, Any] = None, **mcmc_kwargs)

Fit the model to a DataFrame.

Parameters
  • df (pd.DataFrame) – Source dataframe.

  • sampler (str) – Numpyro sampler name. Default NUTS

  • rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.

  • sampler_kwargs – Passed to the numpyro sampler selected.

  • **mcmc_kwargs – Passed to numpyro.infer.MCMC

Returns

The fitted model.

Return type

Model

property formula

Return a formula string describing the model.

classmethod from_dict(data: Dict[str, Any], **model_kw)

Return a pre-fitted model given a dictionary of config.

The dictionary MUST contain the following:

  • samples. A dictionary of variables to MCMC samples. Must contain all feature

names and additional model variables. Each variable’s data must be the same shape.

Any other dict keys will be added as model attributes.

Parameters
  • data (dict.) – Model configuration, including requirements listed above.

  • kwargs – passed to Model() init.

Returns

A ready-to-use model.

Return type

Model

grouped_metrics(df: pandas.core.frame.DataFrame, groupby: Union[str, List[str]], aggfunc: Callable = <function sum>, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]

Return grouped accuracy metrics.

Parameters
  • df (pd.DataFrame) – Input data for the model.

  • groupby (str or list of str) – Groupby clause for pandas.

  • aggfunc (callable) – How to aggregate actuals and predictions wihtin a group. Default sum.

  • aggerrs (bool) – Option to aggregate errors across groups (default True). If true, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

likelihood_func(probs)[source]

Return a Bernoulli likelihood.

Logistic link function.

metrics(df: pandas.core.frame.DataFrame, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]

Get prediction accuracy metrics of the model against data.

Parameters
  • df (pd.DataFrame) – Input data for the model.

  • aggerrs (bool) – Option to aggregate errors across observations (default True). If true, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

model(df: pandas.core.frame.DataFrame)

Define and return samples from the model.

Parameters

df (pd.DataFrame) – Input data for the model.

property num_chains

Return the number of chains per variable in the model.

Assumes samples from all variables have same shape.

property num_samples

Return the number of samples per variable.

Assumes samples from all variables have same shape. Counts samples across all chains.

predict(df: pandas.core.frame.DataFrame, ci: bool = False, ci_interval: float = 0.9, aggfunc: Union[str, Callable] = 'mean') → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Return the average posterior prediction across all samples.

Parameters
  • df (pd.DataFrame) – Source dataframe.

  • ci (float) – Option to include a credible interval around the predictions. Returns a dataframe if true, a series if false. Default False.

  • ci_interval (float) – Credible interval width. Default 0.9.

  • aggfunc (string or callable) – Aggregation function called over predictions across posterior samples. Applies only to the point prediction (not the CI).

Returns

Forecasts. Will be a series with the name of the dv if no ci. Will be a dataframe if ci is included.

Return type

pd.Series or pd.DataFrame

classmethod preprocess_config_dict(config: dict) → dict

Run checks and transformations on dicts for use in from_dict().

sample_posterior_predictive(df: pandas.core.frame.DataFrame, hdpi: bool = False, hdpi_interval: float = 0.9, rng_key: jax.numpy.lax_numpy.ndarray = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Obtain samples from the posterior predictive.

Parameters
  • df (pd.DataFrame) – Source dataframe.

  • hdpi (bool) – Option to include lower/upper bound of the highest posterior density interval. Returns a dataframe if true, a series if false. Default False.

  • hdpi_interval (float) – HDPI width. Default 0.9.

  • rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.

Returns

Forecasts. Will be a series with the name of the dv if no HDPI. Will be a dataframe if HDPI is included.

Return type

pd.Series or pd.DataFrame

property samples_df

Return a DataFrame of the model’s MCMC samples.

property samples_flat

Provide a 1D view of the model’s samples.

split_rand_key(n: int = 1) → jax.random.PRNGKey

Split the random key, assign a new key and return the subkeys.

Parameters

n (int) – Number of subkeys to generate. Default 1.

Returns

An array of PRNG keys or just a single key (if n=1).

Return type

random.PRNGKey

to_json() → str

Return a JSON payload of the model’s config.

classmethod transform(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Transform a dataframe for model input.

Parameters

df (pd.DataFrame) – Source dataframe to transform.

Returns

Dataframe containing transformed inputs.

Return type

pd.DataFrame

class shabadoo.Poisson(rng_seed: int = None)[source]

Exponential/poisson family model for rate data.

fit(df: pandas.core.frame.DataFrame, sampler: str = 'NUTS', rng_key: jax.numpy.lax_numpy.ndarray = None, sampler_kwargs: Dict[str, Any] = None, **mcmc_kwargs)

Fit the model to a DataFrame.

Parameters
  • df (pd.DataFrame) – Source dataframe.

  • sampler (str) – Numpyro sampler name. Default NUTS

  • rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.

  • sampler_kwargs – Passed to the numpyro sampler selected.

  • **mcmc_kwargs – Passed to numpyro.infer.MCMC

Returns

The fitted model.

Return type

Model

property formula

Return a formula string describing the model.

classmethod from_dict(data: Dict[str, Any], **model_kw)

Return a pre-fitted model given a dictionary of config.

The dictionary MUST contain the following:

  • samples. A dictionary of variables to MCMC samples. Must contain all feature

names and additional model variables. Each variable’s data must be the same shape.

Any other dict keys will be added as model attributes.

Parameters
  • data (dict.) – Model configuration, including requirements listed above.

  • kwargs – passed to Model() init.

Returns

A ready-to-use model.

Return type

Model

grouped_metrics(df: pandas.core.frame.DataFrame, groupby: Union[str, List[str]], aggfunc: Callable = <function sum>, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]

Return grouped accuracy metrics.

Parameters
  • df (pd.DataFrame) – Input data for the model.

  • groupby (str or list of str) – Groupby clause for pandas.

  • aggfunc (callable) – How to aggregate actuals and predictions wihtin a group. Default sum.

  • aggerrs (bool) – Option to aggregate errors across groups (default True). If true, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

likelihood_func(yhat)[source]

Return a poisson likelihood.

Exponential link function.

metrics(df: pandas.core.frame.DataFrame, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]

Get prediction accuracy metrics of the model against data.

Parameters
  • df (pd.DataFrame) – Input data for the model.

  • aggerrs (bool) – Option to aggregate errors across observations (default True). If true, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Returns

If aggerrs, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.

Return type

dict or pd.DataFrame

model(df: pandas.core.frame.DataFrame)

Define and return samples from the model.

Parameters

df (pd.DataFrame) – Input data for the model.

property num_chains

Return the number of chains per variable in the model.

Assumes samples from all variables have same shape.

property num_samples

Return the number of samples per variable.

Assumes samples from all variables have same shape. Counts samples across all chains.

predict(df: pandas.core.frame.DataFrame, ci: bool = False, ci_interval: float = 0.9, aggfunc: Union[str, Callable] = 'mean') → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Return the average posterior prediction across all samples.

Parameters
  • df (pd.DataFrame) – Source dataframe.

  • ci (float) – Option to include a credible interval around the predictions. Returns a dataframe if true, a series if false. Default False.

  • ci_interval (float) – Credible interval width. Default 0.9.

  • aggfunc (string or callable) – Aggregation function called over predictions across posterior samples. Applies only to the point prediction (not the CI).

Returns

Forecasts. Will be a series with the name of the dv if no ci. Will be a dataframe if ci is included.

Return type

pd.Series or pd.DataFrame

classmethod preprocess_config_dict(config: dict) → dict

Run checks and transformations on dicts for use in from_dict().

sample_posterior_predictive(df: pandas.core.frame.DataFrame, hdpi: bool = False, hdpi_interval: float = 0.9, rng_key: jax.numpy.lax_numpy.ndarray = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Obtain samples from the posterior predictive.

Parameters
  • df (pd.DataFrame) – Source dataframe.

  • hdpi (bool) – Option to include lower/upper bound of the highest posterior density interval. Returns a dataframe if true, a series if false. Default False.

  • hdpi_interval (float) – HDPI width. Default 0.9.

  • rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.

Returns

Forecasts. Will be a series with the name of the dv if no HDPI. Will be a dataframe if HDPI is included.

Return type

pd.Series or pd.DataFrame

property samples_df

Return a DataFrame of the model’s MCMC samples.

property samples_flat

Provide a 1D view of the model’s samples.

split_rand_key(n: int = 1) → jax.random.PRNGKey

Split the random key, assign a new key and return the subkeys.

Parameters

n (int) – Number of subkeys to generate. Default 1.

Returns

An array of PRNG keys or just a single key (if n=1).

Return type

random.PRNGKey

to_json() → str

Return a JSON payload of the model’s config.

classmethod transform(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Transform a dataframe for model input.

Parameters

df (pd.DataFrame) – Source dataframe to transform.

Returns

Dataframe containing transformed inputs.

Return type

pd.DataFrame