RCAAP Artefact Extractor

RCAAP is a web portal that gathers information from several DSpace active instances. By periodically indexing each DSpace's documents, the Portal provides a search engine that integrates thousands of scientific papers, doctoral theses and other document types from institutions' own repositories.

This extractor accesses the target machine through SSH and crawls the RCAAP Portal's directories for possibly customized files. These files can contain extra configuration information, new html pages and jsp, or even new scripts.

It then creates a JSON object with all the found files and corresponding system path for future preservation.

Caixa Magica Software

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Find at opensourceprojects.eu

Go to the RCAAP Artefact Extractor's page on opensourceprojects.eu for more details about requirements, interaction and source code by following the link above.

How to install RCAAP Artefact Extractor

Tools: - Git – Git is a distributed revision control and source code management (SCM) system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. In Timbus project, git is therefore utilized to facilitate distributed development and cooperation between partners. All tools are available in www.opensourceprojects.eu website. To get this particular tool, once having Git installed run the following in the command line: git clone https://opensourceprojects.eu/git/p/timbus/context-population/extractors/linux-hw It will save the project in a new folder. - Java 1.7 or over – A simple tutorial on how to install the required java on different platforms is available in https://opensourceprojects.eu/p/timbus/wiki/How%20to%20install%3A%20Java/ - Maven - Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information. It is used to manage all dependencies and build of the project. Besides fetching all required dependencies during build, it allows a fine-grain control over the whole build process.

Artifacts: - Maven pom parents: osgi (2-beta-4) and core (2-beta-3) – As various projects within Timbus utilize common dependencies and have similar build behaviour, a series of maven parents were created to facilitate the creation of new tools – Declaring a pom parent in the main project pom file states that the current project extends its parent and, therefore, follows its behaviour . However, it is possible to override certain build options in order to adapt the parent to the context of the tool being developed. As any other tool or artifact in Timbus, the parent is stored in opensourceprojects repository and can be fetched by running the following command: git clone https://opensourceprojects.eu/git/p/timbus/support/maven-parents timbus-support-maven-parents - Extractors-core (currently version 0.0.3-RELEASE) – This module sets the behaviour for all remote extractors in TIMBUS and it's necessary to invoke this artifact within Virgo environment (for more detail, see description of this tool). You can get it by running the command: git clone https://@opensourceprojects.eu/git/p/timbus/context-population/extractors-core The project does have other dependencies, however all of them are fetched automatically from remote repositories during the build process.

Step-by-step compilation Once all requirements are met, the following steps are necessary to properly compile the project: - Install the maven parent in the local maven repository. All other project dependencies are available in remote accessible repositories, but the parent has to be installed so that maven is able to find it when compiling: • Go to timbus-support-maven-parents/core • Run “mvn install”. This command will recognize the pom.xml file in the folder and install it properly on the local repository. • Go to timbus-support-maven-parents/osgi and perform the same command. - Install the extractors-core artifact dependency in your local repository. For this, go to the root directory of the previously fetched project and run “mvn install”. - Go to the SSH Wrapper project's main folder, and run “mvn clean package” - This command will build the project appropriately and place it in a “target” folder. The [clean] option replaces the “target” folder if there is already any.

If everything was successful, the compiled .jar file can now be found in the “target” folder. You can deploy this jar file into a configured Virgo installation (see THIS GUIDE for instructions on how to install and set up Virgo)

Create a New Extractor
To learn how to develop an extractor, follow this link to a tutorial by Caixa Magica Software: https://opensourceprojects.eu/p/timbus/context-population/extractors/wiki/How%20to%20create%20a%20new%20Extractor/













Learn more about the extractors from the video about the Linux Hardware Extractor.

Back to Context Population Framework