.. _usage:

=====
Usage
=====

Synopsis
========

``haggregate [--traceback] config_file``

Description and quick start
===========================

``haggregate`` gets the data of time series from files and creates time
series of a larger time step, storing the result in files.  The
details of its operation are specified in the configuration file
specified on the command line.

Installation
------------

``pip install haggregate``

How to run it
-------------

First, you need to create a configuration file with a text editor such
as ``vim``, ``emacs``, ``notepad``, or whatever. Create such a file
and name it, for example, :file:`/var/tmp/haggregate.conf`, with
the following contents (the contents don't matter at this stage, just
copy and paste them from below)::

    [General]
    loglevel = INFO

Then, open a command prompt and give it this command::

    haggregate /var/tmp/haggregate.conf

If you have done everything correctly, it should output an error message
complaining that something in its configuration file isn't right.

Configuration file example
--------------------------

Take a look at the following example configuration file and read the
explanatory comments that follow it:

.. code-block:: ini

    [General]
    loglevel = INFO
    logfile = /var/log/haggregate/haggregate.log
    base_dir = /var/cache/timeseries/
    target_step = 1H
    min_count = 2
    missing_flag = DATEINSERT

    [temperature]
    source_file = temperature-10min.hts
    target_file = temperature-hourly.hts
    method = mean

    [rainfall]
    source_file = rainfall-10min.hts
    target_file = rainfall-hourly.hts
    method = sum

With the above configuration file, ``haggregate`` will log information
in the file specified by :option:`logfile`. It will aggregate the
specified time series into hourly (``1H``). The filenames specified with
:option:`source_file` and :option:`target_file` are relative to
:option:`base_dir`. For the temperature, source records will be
averaged, whereas for rainfall they will be summed.

Configuration file reference
============================

The configuration file has the format of INI files. There is a
``[General]`` section with general parameters, and any number of other
sections, which we will call "time series sections", each time series
section referring to one time series.

General parameters
------------------

.. option:: loglevel

   Optional. Can have the values ``ERROR``, ``WARNING``, ``INFO``,
   ``DEBUG``.  The default is ``WARNING``.

.. option:: logfile

   Optional. The full pathname of a log file. If unspecified, log
   messages will go to the standard error.

.. option:: base_dir

   Optional. ``haggregate`` will change directory to this directory, so
   any relative filenames will be relative to this directory. If
   unspecified, relative filenames will be relative to the directory
   from which ``haggregate`` was started.

.. option:: target_step

   A string specifying the target time step, as a pandas "frequency".
   Examples of steps are "1D" for day, "1H" for hour, "1T" or "1min" for
   minute. You can also use larger multipliers, like "30T" for 30 minutes.
   The program hasn't been tested for monthly or larger time steps.

.. option:: target_timestamp_offset

   Optional. A string specifying the resulting timestamp offset, as a
   pandas "frequency". For example, for ``target_timestamp_offset=1D``,
   if we set ``target_timestamp_offset=1min``, the resulting time stamps
   will be ending in 23:59. This does not modify the calculations; it
   only offsets the timestamp. For example, if without
   ``target_timestamp_offset`` one of the resulting timeseries records
   is ``2019-12-05 00:00, 3.14``, then with
   ``target_timestamp_offset=-10min`` the same processing will result in
   ``2019-12-05 00:10, 3.14``.

.. option:: min_count
            missing_flag

   If some of the source records corresponding to a destination record
   are missing, :option:`min_count` specifies what will be done. If
   there are fewer than :option:`min_count` source records corresponding
   to a destination record, the resulting destination record is null;
   otherwise, the destination record is derived even though some records
   are missing. In that case, the flag specified by
   :option:`missing_flag` is raised in the destination record.

Time series sections
--------------------

The name of the section is ignored.

.. option:: source_file

   The filename of the source file with the time series, in `file
   format`_; it must be absolute or relative to :option:`base_dir`.

.. option:: target_file

   The filename of the target file, which will be written in `file
   format`_; it must be absolute or relative to :option:`base_dir`. In
   this version of ``haggregate``, all the aggregation is repeated even
   if it or part of it has been done in the past, and the file is
   entirely overwritten if it already exists.

.. option:: method

   How the aggregation will be performed; one of "mean", "sum",
   "max" and "min".

.. _file format: https://github.com/openmeteo/htimeseries/#file-format

How the aggregation is performed
================================

The aggregation is performed in two steps: Regularization and
aggregation. For the regularization, see
:ref:`regularization-algorithm`. The mode used is "instantaneous" for
mean, and "interval" for sum, max and min.

After regularization is complete, aggregation is trivial. The timestamp
in an aggregated record is the end of the interval.

For example, if you aggregate a ten-minute time series to hourly, the
record with timestamp ``11:00`` is the average or sum or max or min of
time stamps ``10:10``, ``10:20``, ..., ``10:50``, ``11:00``.

Likewise, if you aggregate an hourly time series to daily, the record
with timestamp ``2020-01-25 00:00`` is the average or sum or max or min
of time stamps ``2020-01-24 00:10``, ..., ``2020-01-25 00:00``.

Thus, the daily time series with timestamp ``2020-01-25 00:00`` is
actually aggregated from 2020-01-24 (the previous day). This can be
confusing, so it may be a good idea to use ``2020-01-24 23:59`` as the
resulting timestamp instead. This can be achieved by setting
``target_timestamp_offset`` to ``1min``.