Process mining is a relatively new discipline, whose basic research and practical purpose is to extract process models from data given in the form of event logs, checking existing models for conformance with actual processes and improving them.
Transition systems are extensively used to formalize processes extracted from event logs.
A transition system can be constructed from an event log by using prefix-based techniques in a very natural way.
We consider several metrics that describe a model's quality. Replay fitness quantifies the extent to which a process model can reproduce the behavior recorded in a log. Complexity of the model is estimated by simplicity and precision (metrics which show how precise the model is in respect to the event log).
The major weakness of models constructed from real-life event logs is their size.
Despite the fact that there are a number of approaches aimed to reduce the size of transition systems, application of the existing approaches results in either too large or too small models. In the former case the model size is big enough for being readable. Furthermore, it becomes difficult or even impossible to apply existing transition system analysis techniques that are sensitive to the size of input models. For example, the state-based region algorithm has an exponential complexity dependence on the size of the input model, so its applicability is limited to fairly small models. In the other case, due to merged states a rather small model implies considerably much of extra behavior, which makes the model less precise and thus less applicable.
The main goal of this project is to develop a number of approaches for reducing the size of a transition system mined from an event log in a flexible manner.
Reduction with Balancing between Precision and Simplicity¶
This method involves an original 3-step algorithm achieving the goal by using a variable-size window based on a state frequency characteristic. The approach preserves (perfect) fitness of a model and balances between its simplicity and precision by introducing a set of adjustable parameters.