Usage¶
Synopsis¶
haggregate [--traceback] config_file
Description and quick start¶
haggregate
gets the data of time series from files and creates time
series of a larger time step, storing the result in files. The
details of its operation are specified in the configuration file
specified on the command line.
Installation¶
pip install haggregate
How to run it¶
First, you need to create a configuration file with a text editor such
as vim
, emacs
, notepad
, or whatever. Create such a file
and name it, for example, /var/tmp/haggregate.conf
, with
the following contents (the contents don’t matter at this stage, just
copy and paste them from below):
[General]
loglevel = INFO
Then, open a command prompt and give it this command:
haggregate /var/tmp/haggregate.conf
If you have done everything correctly, it should output an error message complaining that something in its configuration file isn’t right.
Configuration file example¶
Take a look at the following example configuration file and read the explanatory comments that follow it:
[General]
loglevel = INFO
logfile = /var/log/haggregate/haggregate.log
base_dir = /var/cache/timeseries/
target_step = 1H
min_count = 2
missing_flag = DATEINSERT
[temperature]
source_file = temperature-10min.hts
target_file = temperature-hourly.hts
method = mean
[rainfall]
source_file = rainfall-10min.hts
target_file = rainfall-hourly.hts
method = sum
With the above configuration file, haggregate
will log information
in the file specified by logfile
. It will aggregate the
specified time series into hourly (1H
). The filenames specified with
source_file
and target_file
are relative to
base_dir
. For the temperature, source records will be
averaged, whereas for rainfall they will be summed.
Configuration file reference¶
The configuration file has the format of INI files. There is a
[General]
section with general parameters, and any number of other
sections, which we will call “time series sections”, each time series
section referring to one time series.
General parameters¶
- loglevel¶
Optional. Can have the values
ERROR
,WARNING
,INFO
,DEBUG
. The default isWARNING
.
- logfile¶
Optional. The full pathname of a log file. If unspecified, log messages will go to the standard error.
- base_dir¶
Optional.
haggregate
will change directory to this directory, so any relative filenames will be relative to this directory. If unspecified, relative filenames will be relative to the directory from whichhaggregate
was started.
- target_step¶
A string specifying the target time step, as a pandas “frequency”. Examples of steps are “1D” for day, “1H” for hour, “1T” or “1min” for minute. You can also use larger multipliers, like “30T” for 30 minutes. The program hasn’t been tested for monthly or larger time steps.
- target_timestamp_offset¶
Optional. A string specifying the resulting timestamp offset, as a pandas “frequency”. For example, for
target_timestamp_offset=1D
, if we settarget_timestamp_offset=1min
, the resulting time stamps will be ending in 23:59. This does not modify the calculations; it only offsets the timestamp. For example, if withouttarget_timestamp_offset
one of the resulting timeseries records is2019-12-05 00:00, 3.14
, then withtarget_timestamp_offset=-10min
the same processing will result in2019-12-05 00:10, 3.14
.
- min_count¶
- missing_flag¶
If some of the source records corresponding to a destination record are missing,
min_count
specifies what will be done. If there are fewer thanmin_count
source records corresponding to a destination record, the resulting destination record is null; otherwise, the destination record is derived even though some records are missing. In that case, the flag specified bymissing_flag
is raised in the destination record.
Time series sections¶
The name of the section is ignored.
- source_file¶
The filename of the source file with the time series, in file format; it must be absolute or relative to
base_dir
.
- target_file¶
The filename of the target file, which will be written in file format; it must be absolute or relative to
base_dir
. In this version ofhaggregate
, all the aggregation is repeated even if it or part of it has been done in the past, and the file is entirely overwritten if it already exists.
- method¶
How the aggregation will be performed; one of “mean”, “sum”, “max” and “min”.
How the aggregation is performed¶
The aggregation is performed in two steps: Regularization and aggregation. For the regularization, see How regularization is performed. The mode used is “instantaneous” for mean, and “interval” for sum, max and min.
After regularization is complete, aggregation is trivial. The timestamp in an aggregated record is the end of the interval.
For example, if you aggregate a ten-minute time series to hourly, the
record with timestamp 11:00
is the average or sum or max or min of
time stamps 10:10
, 10:20
, …, 10:50
, 11:00
.
Likewise, if you aggregate an hourly time series to daily, the record
with timestamp 2020-01-25 00:00
is the average or sum or max or min
of time stamps 2020-01-24 00:10
, …, 2020-01-25 00:00
.
Thus, the daily time series with timestamp 2020-01-25 00:00
is
actually aggregated from 2020-01-24 (the previous day). This can be
confusing, so it may be a good idea to use 2020-01-24 23:59
as the
resulting timestamp instead. This can be achieved by setting
target_timestamp_offset
to 1min
.