.. _stat_tool_syntax:

.. |leq|   unicode:: U+02264
.. |geq|   unicode:: U+02265

.. testsetup:: *

    from openalea.stat_tool import *


.. .. include:: alias.rst

File Syntax, SEQUENCE package
##############################

.. contents::


An ASCII file format is defined for each of the following object type of the STAT module:


type HIDDEN_MARKOV
========================
A hidden Markov chain is constructed from an underlying Markov chain and nonparametric observation (or state-dependent) distributions. Consider the following example::

    HIDDEN_MARKOV_CHAIN

    2 STATES
    ORDER 1

    INITIAL_PROBABILITIES
    0.8  0.2

    TRANSITION_PROBABILITIES
    0.6  0.4
    0.1  0.9

    OBSERVATION_PROBABILITIES

    2 SPACES

    SPACE 1

    STATE 0
    OBSERVATION 0 : 1.0

    STATE 1
    OBSERVATION 0 : 0.2
    OBSERVATION 1 : 0.8

    SPACE 2

    STATE 0
    OBSERVATION 0 : 0.2
    OBSERVATION 1 : 0.4
    OBSERVATION 2 : 0.4

    STATE 1
    OBSERVATION 0 : 0.8
    OBSERVATION 1 : 0.1
    OBSERVATION 2 : 0.1

The first line gives the object type. The underlying Markov chain is then defined on subsequent lines according to the syntactic form defined for the type MARKOV. The observation (or state-dependent) probabilities relating the output processes to the non-observable state process are then defined. Since the process is 'hidden', at least one possible output should be observable in more than one state.

type HIDDEN_SEMI-MARKOV
=============================

A hidden semi-Markov chain is constructed from an underlying semi-Markov chain (first-order Markov chain representing transition between distinct states and state occupancy distributions associated to the non-absorbing states) and nonparametric observation (or state-dependent) distributions. The state occupancy distributions are defined as objects of type DISTRIBUTION with the additional constraint that the minimum time spent in a given state is 1 (INF_BOUND |leq| 1). Consider the following example::

    HIDDEN_SEMI-MARKOV_CHAIN

    4 STATES

    INITIAL_PROBABILITIES
    0.8  0.2  0.0  0.0

    TRANSITION_PROBABILITIES
    0.0  0.6  0.4  0.0
    0.0  0.0  0.7  0.3
    0.0  0.2  0.0  0.8
    0.0  0.0  0.0  1.0

    STATE 0 OCCUPANCY_DISTRIBUTION
    NEGATIVE_BINOMIAL  INF_BOUND : 2  PARAMETER : 3.2  PROBABILITY : 0.4

    STATE 1 OCCUPANCY_DISTRIBUTION
    BINOMIAL  INF_BOUND : 1  SUP_BOUND : 12  PROBABILITY : 0.6

    STATE 2 OCCUPANCY_DISTRIBUTION
    POISSON  INF_BOUND : 1  PARAMETER : 5.4

    OBSERVATION_PROBABILITIES

    1 SPACE

    SPACE 1

    STATE 0
    OBSERVATION 0 : 1.0

    STATE 1
    OBSERVATION 0 : 0.3
    OBSERVATION 1 : 0.6
    OBSERVATION 2 : 0.1

    STATE 2
    OBSERVATION 0 : 0.2
    OBSERVATION 1 : 0.4
    OBSERVATION 2 : 0.4

    STATE 3
    OBSERVATION 2 : 1.0

Note that absorbing states such as state 3 :math:`(p_{33}=1)` are by nature Markovian. It is also possible to define nonabsorbing Markovian states such as state 2 :math:`(0 < p_{22} < 1)`. In this case, the resulting model is a hybrid hidden Markov/semi--Markov chain.

The first line gives the object type. The underlying semi-Markov chain (embedded first-order Markov chain and state occupancy distributions associated to the nonabsorbing states) is then defined on subsequent lines according to the syntactic form defined for the type SEMI-MARKOV. The observation (or state-dependent) probabilities relating the output processes to the non-observable state process are then defined. Since the process is 'hidden', at least one possible output should be observable in more than one state.


type MARKOV
=================
Consider the following example of an homogeneous Markov chain::

    MARKOV_CHAIN

    2 STATES
    ORDER 2

    INITIAL_PROBABILITIES
    0.8  0.2

    TRANSITION_PROBABILITIES
    0.6  0.4
    0.1  0.9
    0.3  0.7
    0.2  0.8

The first line gives the object type. Then, the number of states (between 2 and 15) and the order (between 1 and 4) are defined on the two subsequent lines. On the next lines, the initial probabilities and the transition probabilities are given. Since, the initial probabilities and the transition probabilities for a given memory constitute distributions, the elements of a line should sum to one.

It is also possible to define observation (or state-dependent) probabilities if each possible output can be observed in a single state. With this restriction, the state space corresponds to a partition of the output space and the overall process is a lumped process::

    OBSERVATION_PROBABILITIES

    2 SPACES

    SPACE 1

    STATE 0
    OBSERVATION 0 : 1.0

    STATE 1
    OBSERVATION 1 : 0.2
    OBSERVATION 2 : 0.8

    SPACE 2

    STATE 0
    OBSERVATION 0 : 0.7
    OBSERVATION 1 : 0.3

    STATE 1
    OBSERVATION 2 : 0.6
    OBSERVATION 3 : 0.4

Consider the following example of a non-homogeneous Markov chain::

    NON-HOMOGENEOUS_MARKOV_CHAIN

    3 STATES
    ORDER 1

    INITIAL_PROBABILITIES
    0.5  0.3  0.2

    TRANSITION_PROBABILITIES
    0.6  0.2  0.2
    0.1  0.8  0.1
    0.2  0.1  0.7

    STATE 0 HOMOGENEOUS

    STATE 1 NON-HOMOGENEOUS
    MONOMOLECULAR FUNCTION  PARAMETER 1 : 0.99  PARAMETER 2 : -0.34  PARAMETER 3 : 0.3

    STATE 2 NON-HOMOGENEOUS
    LOGISTIC FUNCTION  PARAMETER 1 : 0. 99  PARAMETER 2 : 2.8  PARAMETER 3 : 0.2

The first line gives the object type. Then, the initial probabilities and the transition probabilities are given in the same way as for an homogeneous Markov chain. The non-homogeneous / homogeneous character is then defined state by state. In the case of a non-homogeneous transition distribution, the function :math:`p_{ii}(t)` represents the self-transition in state `i` as a function of the index parameter `t`. The corresponding transition distribution defined in the transition probability matrix gives the relative weights of the probabilities of leaving state `i`.

For a MONOMOLECULAR function :math:`\left(p_{ii}(t)=a+b \exp{(-ct)}\right)`, the following constraints apply:

* 0 |leq| PARAMETER 1 |leq| 1
* 0 |leq| PARAMETER 1 + PARAMETER 2 |leq| 1
* PARAMETER 3 > 0

For a MONOMOLECULAR function :math:`\left(p_{ii}(t)=a/ \{ 1+b \exp{(-ct)}\}\right)`, the following constraints apply:

* 0 |leq| PARAMETER 1 |leq| 1
* 0 |leq| PARAMETER 1 / (1. + PARAMETER 2) |leq| 1
* PARAMETER 3 > 0

SEMI-MARKOV
======================
A semi-Markov chain is constructed from a first-order Markov chain representing transition between distinct states and state occupancy distributions associated to the nonabsorbing states. The state occupancy distributions are defined as objects of type DISTRIBUTION with the additional constraint that the minimum time spent in a given state is at least 1 (INF_BOUND |leq| 1). Consider the following example::

    SEMI-MARKOV_CHAIN

    4 STATES

    INITIAL_PROBABILITIES
    0.8  0.2  0.0  0.0

    TRANSITION_PROBABILITIES
    0.0  0.6  0.4  0.0
    0.0  0.0  0.7  0.3
    0.0  0.2  0.0  0.8
    0.0  0.0  0.0  1.0

    STATE 0 OCCUPANCY_DISTRIBUTION
    NEGATIVE_BINOMIAL  INF_BOUND : 2  PARAMETER : 3.2  PROBABILITY : 0.4

    STATE 1 OCCUPANCY_DISTRIBUTION
    BINOMIAL  INF_BOUND : 1  SUP_BOUND : 12  PROBABILITY : 0.6

    STATE 2 OCCUPANCY_DISTRIBUTION
    POISSON  INF_BOUND : 1  PARAMETER : 5.4

The first line gives the object type while the second line gives the number of states (between 2 and 15). The embedded first-order Markov chain is then defined on subsequent lines by its initial probabilities and its transition probabilities (note that, unlike for the type MARKOV, the order should not be specified). Since this embedded Markov chain represents only transitions between distinct states, the self-transitions (i.e. elements of the main diagonal) should be equal to zero except in the case of absorbing states where the self-transitions are equal to one (e.g. state 3 in the above example). The state occupancy distributions are then defined for each nonabsorbing state according to the syntactic form defined for the type DISTRIBUTION with the additional constraint that time spent in a given state is at least 1 (INF_BOUND |leq| 1). Like for the type MARKOV, observation (or state-dependent) probabilities can be defined in order to specify a lumped process (with the restriction that each possible output can be observed in a single state).

Note that absorbing states such as state 3 :math:`(p_{33}=1)` are by nature Markovian. It is also possible to define nonabsorbing Markovian states such as state 2 :math:`(0 < p_{22} < 1)`. In this case, the resulting model is a hybrid hidden Markov/semi--Markov chain.


SEQUENCES
=====================
The syntactic form of the type SEQUENCES is constituted of a header giving the number and the type of variables and of the sequence. Consider the following example of univariate sequences::

    1 VARIABLE

    VARIABLE 1 : STATE

    1 0 0 0 1 1 2 0 2 2 2 1 1 0 1 0 1 1 1 1 0 1 1 1 \
    0 1 2 2 2 1

    0 0 0 1 1 0 2 0 2 2 2 1 1 1 1 0 1 0 0 0 0 0

The type STATE is the generic type. The character '\' enables to continue a sequence on the following line.

Consider the following example of multivariate sequences::

    2 VARIABLES

    VARIABLE 1 : STATE
    VARIABLE 2 : STATE

    1 0 | 0 0 | 1 0 | 2 0 | 2 1 | 2 1 | 1 0 | 1 0 | 1 0 | 0 1 | 0 1 | 1 1 \
    0 1 | 2 0 | 2 1

    0 0 | 0 0 | 1 0 | 2 0 | 2 1 | 1 1 | 1 0 | 1 0 | 0 0 | 0 0

The character '|' enables to separate successive vectors.

Consider the following example of sequences with an explicit index parameter of type POSITION::

    2 VARIABLES

    VARIABLE 1 : POSITION
    VARIABLE 2 : STATE

    10 1 | 12 0 | 13 1 | 14 2 | 15 2 | 20 2 | 22 1 | 23 1 | 27 1 | 30 0 | 31 0 | 32 1 \
    35 1 | 37 0 | 40 1 | 45

    5 0 | 7 0 | 10 0 | 11 0 | 15 1 | 18 1 | 20 0 | 21 0 | 22 0 | 25 0 | 25

This explicit index parameter is given as a first variable and the other variables (at least one) should be of type STATE. The index values should be increasing along sequences and the sequence ends with a final index value.

The explicit index parameter of type POSITION can be replaced by inter-position intervals::

    2 VARIABLES

    VARIABLE 1 : POSITION_INTERVAL
    VARIABLE 2 : STATE

    10 1 | 2 0 | 1 1 | 1 2 | 1 2 | 5 2 | 2 1 | 1 1 | 4 1 | 3 0 | 1 0 | 1 1 \
    3 1 | 2 0 | 3 1 | 5

    5 0 | 2 0 | 3 0 | 1 0 | 4 1 | 3 1 | 2 0 | 1 0 | 1 0 | 3 0 | 0

Consider the following example of sequences with an explicit index parameter of type TIME::

    2 VARIABLES

    VARIABLE 1 : TIME
    VARIABLE 2 : STATE

    3 1 | 7 4 | 10 8 | 14 10 | 18 15 | 21 16 | 25 18 | 28 19 | 31 20 | 35 22 | 39 23 | 42 24 \
    45 25 | 49 25

    3 1 | 7 2 | 10 6 | 14 9 | 18 13 | 21 14 | 25 15 | 28 16 | 31 17 | 35 17

The only difference with the explicit index parameter of type POSITION is that the index values should be strictly increasing along sequences and that no final index value is required.

The explicit index parameter of type TIME can be replaced by time intervals::

    2 VARIABLES

    VARIABLE 1 : TIME_INTERVAL
    VARIABLE 2 : STATE

    3 1 | 4 4 | 3 8 | 4 10 | 4 15 | 3 16 | 4 18 | 3 19 | 3 20 | 4 22 | 4 23 | 3 24 \
    3 25 | 4 25

    3 1 | 4 2 | 3 6 | 4 9 | 4 13 | 3 14 | 4 15 | 3 16 | 3 17 | 4 17

TIME_EVENTS
=======================
The syntactic form of data of type {time interval between two observation dates, number of events occurring between these two observation dates} consists in giving, in a first column, the time interval between two observation dates (length of the observation period), in a second column, the number of events occurring between these two observation dates and, in a third column, the corresponding frequency. The time interval between two observation dates should be given in increasing order and then, for each possible time interval, the number of events should be given in increasing order. This is equivalent of giving successively the frequency distribution of the number of events for each possible time interval between two observation dates, ranked in increasing order.

::

    # frequency distribution of the number of events for an observation period of length 20
    20  2   1
    20  3   2
    20  4   4
    20  5   12
    20  6   14
    20  7   6
    20  8   2
    20  9   1

::

    #frequency distribution of the number of events for an observation period of length 30
    30  3   1
    30  5   2
    30  6   4
    30  7   12
    30  8   14
    30  9   6
    30  10  2
    30  12  1

TOPS
================
Consider the following example::

    2 VARIABLES

    VARIABLE 1 : POSITION
    VARIABLE 2 : NB_INTERNODE

    10 5 | 12 5 | 13 6 | 13 8 | 15 7 | 20 10 | 22 11 | 23 11 | 27 15 | 30 16 | 31 15 | 32 17 \
    35 16 | 37 18 | 40 19 | 45

    5 2 | 7 4 | 10 5 | 11 6 | 15 7 | 18 8 | 20 9 | 21 11 | 22 11 | 25 12 | 25

The syntactic form of the type TOPS is a variant of the syntactic form of the type SEQUENCES. 'Tops' can be seen as sequences with an explicit index parameter of type POSITION. This index parameter represents the position of successive offspring shoots along the parent shoot and a final index value gives the number of internodes of the parent shoot. The second variable of type NB_INTERNODE gives the number of internodes of the offspring shoots.

The explicit index parameter of type POSITION can be replaced by inter-position intervals::

    2 VARIABLES

    VARIABLE 1 : POSITION_INTERVAL
    VARIABLE 2 : NB_INTERNODE

    10 5 | 2 5 | 1 6 | 0 8 | 2 7 | 5 10 | 2 11 | 1 11 | 4 15 | 3 16 | 1 15 | 1 17 \
    3 16 | 2 18 | 3 19 | 5

    5 2 | 2 4 | 3 5 | 1 6 | 4 7 | 3 8 | 2 9 | 1 11 | 1 11 | 3 12 | 0

TOP_PARAMETERS
==========================
A model of 'tops' is defined by three parameters, namely the growth probability of the parent shoot, the growth probability of the offspring shoots (both in the sense of Bernoulli processes) and the growth rhythm ratio offspring shoots / parent shoot. Consider the following example::

    TOP_PARAMETERS

    PROBABILITY : 0.7
    AXILLARY_PROBABILITY : 0.6
    RHYTHM_RATIO : 0.8

The following constraints apply to the parameters:

* 0.05 |leq| PROBABILITY |leq| 1
* 0.05 |leq| AXILLARY_PROBABILITY |leq| 1
* 1/3 |leq| RHYTHM_RATIO |leq| 3