API Documentation¶
Models for very easy Bayesian Regression.
-
class
shabadoo.
Normal
(rng_seed: int = None)[source]¶ Gaussian/normal family model for the generic regression model.
-
fit
(df: pandas.core.frame.DataFrame, sampler: str = 'NUTS', rng_key: jax.numpy.lax_numpy.ndarray = None, sampler_kwargs: Dict[str, Any] = None, **mcmc_kwargs)¶ Fit the model to a DataFrame.
- Parameters
df (pd.DataFrame) – Source dataframe.
sampler (str) – Numpyro sampler name. Default NUTS
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.
sampler_kwargs – Passed to the numpyro sampler selected.
**mcmc_kwargs – Passed to numpyro.infer.MCMC
- Returns
The fitted model.
- Return type
Model
-
property
formula
¶ Return a formula string describing the model.
-
classmethod
from_dict
(data: Dict[str, Any], **model_kw)¶ Return a pre-fitted model given a dictionary of config.
The dictionary MUST contain the following:
samples. A dictionary of variables to MCMC samples. Must contain all feature
names and additional model variables. Each variable’s data must be the same shape.
Any other dict keys will be added as model attributes.
- Parameters
data (dict.) – Model configuration, including requirements listed above.
kwargs – passed to Model() init.
- Returns
A ready-to-use model.
- Return type
Model
-
grouped_metrics
(df: pandas.core.frame.DataFrame, groupby: Union[str, List[str]], aggfunc: Callable = <function sum>, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶ Return grouped accuracy metrics.
- Parameters
df (pd.DataFrame) – Input data for the model.
groupby (str or list of str) – Groupby clause for pandas.
aggfunc (callable) – How to aggregate actuals and predictions wihtin a group. Default sum.
aggerrs (bool) – Option to aggregate errors across groups (default True). If true, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.
- Returns
If aggerrs, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.
- Return type
dict or pd.DataFrame
-
metrics
(df: pandas.core.frame.DataFrame, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶ Get prediction accuracy metrics of the model against data.
- Parameters
df (pd.DataFrame) – Input data for the model.
aggerrs (bool) – Option to aggregate errors across observations (default True). If true, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.
- Returns
If aggerrs, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.
- Return type
dict or pd.DataFrame
-
model
(df: pandas.core.frame.DataFrame)¶ Define and return samples from the model.
- Parameters
df (pd.DataFrame) – Input data for the model.
-
property
num_chains
¶ Return the number of chains per variable in the model.
Assumes samples from all variables have same shape.
-
property
num_samples
¶ Return the number of samples per variable.
Assumes samples from all variables have same shape. Counts samples across all chains.
-
predict
(df: pandas.core.frame.DataFrame, ci: bool = False, ci_interval: float = 0.9, aggfunc: Union[str, Callable] = 'mean') → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Return the average posterior prediction across all samples.
- Parameters
df (pd.DataFrame) – Source dataframe.
ci (float) – Option to include a credible interval around the predictions. Returns a dataframe if true, a series if false. Default False.
ci_interval (float) – Credible interval width. Default 0.9.
aggfunc (string or callable) – Aggregation function called over predictions across posterior samples. Applies only to the point prediction (not the CI).
- Returns
Forecasts. Will be a series with the name of the dv if no ci. Will be a dataframe if ci is included.
- Return type
pd.Series or pd.DataFrame
-
classmethod
preprocess_config_dict
(config: dict) → dict¶ Run checks and transformations on dicts for use in
from_dict()
.
-
sample_posterior_predictive
(df: pandas.core.frame.DataFrame, hdpi: bool = False, hdpi_interval: float = 0.9, rng_key: jax.numpy.lax_numpy.ndarray = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Obtain samples from the posterior predictive.
- Parameters
df (pd.DataFrame) – Source dataframe.
hdpi (bool) – Option to include lower/upper bound of the highest posterior density interval. Returns a dataframe if true, a series if false. Default False.
hdpi_interval (float) – HDPI width. Default 0.9.
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.
- Returns
Forecasts. Will be a series with the name of the dv if no HDPI. Will be a dataframe if HDPI is included.
- Return type
pd.Series or pd.DataFrame
-
property
samples_df
¶ Return a DataFrame of the model’s MCMC samples.
-
property
samples_flat
¶ Provide a 1D view of the model’s samples.
-
split_rand_key
(n: int = 1) → jax.random.PRNGKey¶ Split the random key, assign a new key and return the subkeys.
- Parameters
n (int) – Number of subkeys to generate. Default 1.
- Returns
An array of PRNG keys or just a single key (if n=1).
- Return type
random.PRNGKey
-
to_json
() → str¶ Return a JSON payload of the model’s config.
-
classmethod
transform
(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Transform a dataframe for model input.
- Parameters
df (pd.DataFrame) – Source dataframe to transform.
- Returns
Dataframe containing transformed inputs.
- Return type
pd.DataFrame
-
-
class
shabadoo.
Bernoulli
(rng_seed: int = None)[source]¶ Logistic/bernoulli family model, for a binary response variable.
-
fit
(df: pandas.core.frame.DataFrame, sampler: str = 'NUTS', rng_key: jax.numpy.lax_numpy.ndarray = None, sampler_kwargs: Dict[str, Any] = None, **mcmc_kwargs)¶ Fit the model to a DataFrame.
- Parameters
df (pd.DataFrame) – Source dataframe.
sampler (str) – Numpyro sampler name. Default NUTS
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.
sampler_kwargs – Passed to the numpyro sampler selected.
**mcmc_kwargs – Passed to numpyro.infer.MCMC
- Returns
The fitted model.
- Return type
Model
-
property
formula
¶ Return a formula string describing the model.
-
classmethod
from_dict
(data: Dict[str, Any], **model_kw)¶ Return a pre-fitted model given a dictionary of config.
The dictionary MUST contain the following:
samples. A dictionary of variables to MCMC samples. Must contain all feature
names and additional model variables. Each variable’s data must be the same shape.
Any other dict keys will be added as model attributes.
- Parameters
data (dict.) – Model configuration, including requirements listed above.
kwargs – passed to Model() init.
- Returns
A ready-to-use model.
- Return type
Model
-
grouped_metrics
(df: pandas.core.frame.DataFrame, groupby: Union[str, List[str]], aggfunc: Callable = <function sum>, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶ Return grouped accuracy metrics.
- Parameters
df (pd.DataFrame) – Input data for the model.
groupby (str or list of str) – Groupby clause for pandas.
aggfunc (callable) – How to aggregate actuals and predictions wihtin a group. Default sum.
aggerrs (bool) – Option to aggregate errors across groups (default True). If true, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.
- Returns
If aggerrs, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.
- Return type
dict or pd.DataFrame
-
metrics
(df: pandas.core.frame.DataFrame, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶ Get prediction accuracy metrics of the model against data.
- Parameters
df (pd.DataFrame) – Input data for the model.
aggerrs (bool) – Option to aggregate errors across observations (default True). If true, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.
- Returns
If aggerrs, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.
- Return type
dict or pd.DataFrame
-
model
(df: pandas.core.frame.DataFrame)¶ Define and return samples from the model.
- Parameters
df (pd.DataFrame) – Input data for the model.
-
property
num_chains
¶ Return the number of chains per variable in the model.
Assumes samples from all variables have same shape.
-
property
num_samples
¶ Return the number of samples per variable.
Assumes samples from all variables have same shape. Counts samples across all chains.
-
predict
(df: pandas.core.frame.DataFrame, ci: bool = False, ci_interval: float = 0.9, aggfunc: Union[str, Callable] = 'mean') → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Return the average posterior prediction across all samples.
- Parameters
df (pd.DataFrame) – Source dataframe.
ci (float) – Option to include a credible interval around the predictions. Returns a dataframe if true, a series if false. Default False.
ci_interval (float) – Credible interval width. Default 0.9.
aggfunc (string or callable) – Aggregation function called over predictions across posterior samples. Applies only to the point prediction (not the CI).
- Returns
Forecasts. Will be a series with the name of the dv if no ci. Will be a dataframe if ci is included.
- Return type
pd.Series or pd.DataFrame
-
classmethod
preprocess_config_dict
(config: dict) → dict¶ Run checks and transformations on dicts for use in
from_dict()
.
-
sample_posterior_predictive
(df: pandas.core.frame.DataFrame, hdpi: bool = False, hdpi_interval: float = 0.9, rng_key: jax.numpy.lax_numpy.ndarray = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Obtain samples from the posterior predictive.
- Parameters
df (pd.DataFrame) – Source dataframe.
hdpi (bool) – Option to include lower/upper bound of the highest posterior density interval. Returns a dataframe if true, a series if false. Default False.
hdpi_interval (float) – HDPI width. Default 0.9.
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.
- Returns
Forecasts. Will be a series with the name of the dv if no HDPI. Will be a dataframe if HDPI is included.
- Return type
pd.Series or pd.DataFrame
-
property
samples_df
¶ Return a DataFrame of the model’s MCMC samples.
-
property
samples_flat
¶ Provide a 1D view of the model’s samples.
-
split_rand_key
(n: int = 1) → jax.random.PRNGKey¶ Split the random key, assign a new key and return the subkeys.
- Parameters
n (int) – Number of subkeys to generate. Default 1.
- Returns
An array of PRNG keys or just a single key (if n=1).
- Return type
random.PRNGKey
-
to_json
() → str¶ Return a JSON payload of the model’s config.
-
classmethod
transform
(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Transform a dataframe for model input.
- Parameters
df (pd.DataFrame) – Source dataframe to transform.
- Returns
Dataframe containing transformed inputs.
- Return type
pd.DataFrame
-
-
class
shabadoo.
Poisson
(rng_seed: int = None)[source]¶ Exponential/poisson family model for rate data.
-
fit
(df: pandas.core.frame.DataFrame, sampler: str = 'NUTS', rng_key: jax.numpy.lax_numpy.ndarray = None, sampler_kwargs: Dict[str, Any] = None, **mcmc_kwargs)¶ Fit the model to a DataFrame.
- Parameters
df (pd.DataFrame) – Source dataframe.
sampler (str) – Numpyro sampler name. Default NUTS
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.
sampler_kwargs – Passed to the numpyro sampler selected.
**mcmc_kwargs – Passed to numpyro.infer.MCMC
- Returns
The fitted model.
- Return type
Model
-
property
formula
¶ Return a formula string describing the model.
-
classmethod
from_dict
(data: Dict[str, Any], **model_kw)¶ Return a pre-fitted model given a dictionary of config.
The dictionary MUST contain the following:
samples. A dictionary of variables to MCMC samples. Must contain all feature
names and additional model variables. Each variable’s data must be the same shape.
Any other dict keys will be added as model attributes.
- Parameters
data (dict.) – Model configuration, including requirements listed above.
kwargs – passed to Model() init.
- Returns
A ready-to-use model.
- Return type
Model
-
grouped_metrics
(df: pandas.core.frame.DataFrame, groupby: Union[str, List[str]], aggfunc: Callable = <function sum>, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶ Return grouped accuracy metrics.
- Parameters
df (pd.DataFrame) – Input data for the model.
groupby (str or list of str) – Groupby clause for pandas.
aggfunc (callable) – How to aggregate actuals and predictions wihtin a group. Default sum.
aggerrs (bool) – Option to aggregate errors across groups (default True). If true, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.
- Returns
If aggerrs, a dictionary of summary statistics are returned. If False, groupwise errors are returned as a DataFrame.
- Return type
dict or pd.DataFrame
-
metrics
(df: pandas.core.frame.DataFrame, aggerrs: bool = True) → Union[pandas.core.frame.DataFrame, Dict[str, float]]¶ Get prediction accuracy metrics of the model against data.
- Parameters
df (pd.DataFrame) – Input data for the model.
aggerrs (bool) – Option to aggregate errors across observations (default True). If true, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.
- Returns
If aggerrs, a dictionary of summary statistics are returned. If False, pointwise errors are returned as a DataFrame.
- Return type
dict or pd.DataFrame
-
model
(df: pandas.core.frame.DataFrame)¶ Define and return samples from the model.
- Parameters
df (pd.DataFrame) – Input data for the model.
-
property
num_chains
¶ Return the number of chains per variable in the model.
Assumes samples from all variables have same shape.
-
property
num_samples
¶ Return the number of samples per variable.
Assumes samples from all variables have same shape. Counts samples across all chains.
-
predict
(df: pandas.core.frame.DataFrame, ci: bool = False, ci_interval: float = 0.9, aggfunc: Union[str, Callable] = 'mean') → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Return the average posterior prediction across all samples.
- Parameters
df (pd.DataFrame) – Source dataframe.
ci (float) – Option to include a credible interval around the predictions. Returns a dataframe if true, a series if false. Default False.
ci_interval (float) – Credible interval width. Default 0.9.
aggfunc (string or callable) – Aggregation function called over predictions across posterior samples. Applies only to the point prediction (not the CI).
- Returns
Forecasts. Will be a series with the name of the dv if no ci. Will be a dataframe if ci is included.
- Return type
pd.Series or pd.DataFrame
-
classmethod
preprocess_config_dict
(config: dict) → dict¶ Run checks and transformations on dicts for use in
from_dict()
.
-
sample_posterior_predictive
(df: pandas.core.frame.DataFrame, hdpi: bool = False, hdpi_interval: float = 0.9, rng_key: jax.numpy.lax_numpy.ndarray = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Obtain samples from the posterior predictive.
- Parameters
df (pd.DataFrame) – Source dataframe.
hdpi (bool) – Option to include lower/upper bound of the highest posterior density interval. Returns a dataframe if true, a series if false. Default False.
hdpi_interval (float) – HDPI width. Default 0.9.
rng_key (two-element ndarray.) – Optional rng key, will be randomly splitted if not provided.
- Returns
Forecasts. Will be a series with the name of the dv if no HDPI. Will be a dataframe if HDPI is included.
- Return type
pd.Series or pd.DataFrame
-
property
samples_df
¶ Return a DataFrame of the model’s MCMC samples.
-
property
samples_flat
¶ Provide a 1D view of the model’s samples.
-
split_rand_key
(n: int = 1) → jax.random.PRNGKey¶ Split the random key, assign a new key and return the subkeys.
- Parameters
n (int) – Number of subkeys to generate. Default 1.
- Returns
An array of PRNG keys or just a single key (if n=1).
- Return type
random.PRNGKey
-
to_json
() → str¶ Return a JSON payload of the model’s config.
-
classmethod
transform
(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Transform a dataframe for model input.
- Parameters
df (pd.DataFrame) – Source dataframe to transform.
- Returns
Dataframe containing transformed inputs.
- Return type
pd.DataFrame
-