Are These Processes the Same?

   

Tomasz Miksa (SBA)

During the iPres14 conference we presented VPlan – An Ontology for Storing Verification Data and won the Best Paper Award. If you would like to understand where the problem in verification and validation of process lies and how we address it, you should continue reading. I will also provide some insights into behind the scenes discussions during iPres14.

 

Tomasz Miksa

tiny_bar

Greetings from the iPres14 conference in Melbourne, Australia! This year's edition of one of the most important conferences in the digital preservation community was very special to Timbus. We had the opportunity to familiarize the experts – from all over the world – with tools and solutions that Timbus developed in the past few years. We were visible not only during the conference days, but also during the workshops and tutorials that complemented the main event.


Tomasz Miksa and Andreas Rauber at iPres14I, personally, am still very moved by the gala dinner yesterday. This is not because I enjoyed the food so much (which in fact was great), but because our hard work on the project was acknowledged and rewarded when we won the Best Paper Award for our work on verification and validation of preserved processes. This is the second year in a row that Timbus has had the honour of receiving this precious award!


The Best Paper Award is especially valuable to us, because the paper preparation was preceded by countless discussions, drafts, changes and hard decisions, before we agreed on a final solution. This year's paper presents the ontology for storing verification data (VPlan) and builds on top of the verification framework (VFramework) that we presented at iPres13 in Lisbon, Portugal. The enthusiastic reception of our work by the community gives us double satisfaction, because it crowns our work on verification and validation as a whole. In this blog post I will try to explain in simple words why the problem of verification and validation of preserved process is so challenging and how we want to address it.


Figure 1 illustrates a typical scenario in which there is a process running in the original environment that we want to preserve. The process is extracted from the system and deposited into a repository. Very often, after some longer period of time, the process is redeployed, or in other words, installed in a new environment that may be potentially different from the original one. Verification and validation provides analysis that establishes whether the original process and the redeployed process are performing in the same way, or if the alterations in their performance are acceptable in view of the user's requirements.

 

TM Blog VerTheSameFigure 1. The main aim of verification and validation is to state whether the preserved and redeployed processes are the same (or if the alterations are acceptable in view of user's requirements)


Verification and validation is always part of a software development lifecycle and is performed in order to ensure the quality of the product, but is also performed in order to confirm satisfaction of the user's requirements. Unfortunately, the existing solutions and standards focus on performing this evaluation "here and now". This means that both the requirements and the system must be available at the same time. This conflicts with the nature of the digital preservation setting in which we need to perform this evaluation at different moments in time. This situation is depicted in Figure 2.

The only way to solve this conflict is to preserve, in addition to the process, the data needed for its later verification and validation. This data describes the significant properties of the process and also the way they were measured. It is only possible to collect this information while the process is operational. Furthermore, owners of processes usually have a wide range of motivations for preserving and later redeploying processes. These variables affect the assessment criteria. For example, when the process needs to be re-played in a litigation case to prove the correctness of results, then it must be redeployed using exactly the same components and must have exactly the same performance. However, in cases of scientific processes that will be broken into parts for re-use in a new experiment, then substitution of some of its components and change in its functionality (e.g. improved precision of results) are acceptable.

 

TM Blog VerComparison

Figure 2. Direct comparison of processes may not be possible. We need to preserve evidence along with the process and use it later to perform the verification and validation.

For this reason we devised the VFramework to structure the verification and validation process. The VFramework, depicted in Figure 3, consists of two sequences of actions. The first one (depicted in blue) is performed in the original environment. The result obtained from execution of each step is stored in the VPlan. The second sequence (depicted in green) is performed in the redeployment environment. The necessary information for each of the steps is obtained from the VPlan, which is an ontology designed for the purpose of organizing and storing data collected for the verification of preserved processes. It also includes descriptions of the actions taken to collect the data and a clear breakdown of the requirements leading to its collection.


Ontologies enable us to automate substantially the process of verification and validation. For example, we are able to check the completeness of the VPlan by running a set of pre-defined queries. We can also use it to feed the data directly into tools that are capable of calculating the metrics, which are used during the evaluation. Moreover, the VPlan ontology can integrate with other ontologies and specifically with the Timbus Context Model. Thanks to such integration we can easily depict the part of the process from which the data used for computation of verification metrics was collected. Last but not least, parts of the VPlan can be automatically generated with a use of dedicated software tools. For more information please refer to our papers VFrameworkVPlan or the White Paper on Verification and Validation.

 

TM Blog VFramework

Figure 3. The VFramework for verification and validation of processes


To conclude I would like to go back to the iPres14 conference once more. The prevailing opinion expressed behind the scenes was that in the past years the community has put a lot of effort into developing ways to perform digital preservation, e.g. by creating repository systems, devising metadata vocabularies, inventing frameworks and standards. According to a common belief, we now need to shift our focus, that is, we need to start applying and using.


This should be relatively easy for memory institutions and other settings where static digital objects like documents or scans are preserved because they have received most of the attention in the past years. As a consequence, the existing solutions have had enough time to mature. However, this will be more challenging for the relatively young field of dynamic content (and process) preservation, but still highly needed and recommended. Applying and using methods will be a greater challenge for process preservation because only a broad application of solutions devised by Timbus will fully evaluate their usefulness and – if everything goes our way – make the process preservation real.


We are very glad to receive such positive feedback during the conferences and other meetings where we presented project outcomes. We are also very enthusiastic about this shift of interest within the community because the broad adoption of our solutions and their application to solving real-life preservation challenges would give us a feeling of having the work done!