Workshops

Register to workshops directly to the organizers of each workshop. Please observe that the number of participants may be limited.

Developing a repository and suite of tools for Scandinavian literature

Register to: madsrt@cc.au.dk

The goal of the workshop is to set benchmarks for the further development of a machine-readable corpus of Scandinavian literary texts which is part of a project that is the continuation of two Carnegie-Mellon Foundation sponsored conferences on computational approaches to Scandinavian literature. A third conference is planned for UCLA in November 2017.

At the workshop the practical implementation of the following goals will be discussed:
1) a preprocessed benchmark corpus of selected literary texts in the Scandinavian language.

2) a wider machine readable Scandinavian corpus. The corpora will be assembled from DSL, Litteraturbanken and Norwegian libraries.

3) A portfolio of tools with documentation.

The workshop is focused on aligning the needs of literary scholars with the technical solutions that can be developed by the core group members.

Organizers

Professor with Special Responsibilities Mads Rosendahl Thomsen (madsrt@cc.au.dk, Aarhus University), Professor Timothy Tangherlini (tango@humnet.ucla.edu, UCLA) and Associate Professor Kristoffer L. Nielbo (kln@cas.au.dk, Aarhus U) will chair the workshop. We invite 10-12 other participants from Scandinavian and the US, including scholars from Gothenburg University and Oslo University who have taken part in prior meetings.

Data Management for Humanities Scholars – an introduction to Data Management Plans and the Cultural Heritage Data Reuse Charter

Register to:  charles.riondet@inria.fr

With the growth of the Open Science movement in the past few years, researchers have been increasingly encouraged by their home institutions, their funders, and by society at large, to share the data they produce. Significantly, the Horizon 2020 Research and Innovation Programme has undertaken to open the research data produced by H2020 funded projects. A new model of data sharing is emerging, and the challenges this new model raise are impacting more and more dramatically the research ecosystem.

Rather than seeing in it an additional constraint, scholars can benefit from the advantages that this model of openness offers. Sharing their data allows them to collaborate with fellow researchers within the same discipline or with colleagues from other disciplines, to reduce costs by avoiding duplication of data collection, to make easier validation of results, and to increase the impact and visibility of their research outputs.

Opening research data induces not only a change in mentality, but also a change in work methods. Data management has to be seen as the baseline of the research lifecycle. In this regard, it should be thought of as early as possible in a research project, and should be flexible enough to evolve all along the project. For researchers, this practice supposes to plan and decide how data will be collected, organised, managed, stored, preserved and shared during a research project, and after the project is completed. These requirements can best be addressed by setting up so called data management plans. This method is fairly new to most Humanities scholars, although it is a key element of good data management. Data management plans (DMP) have in particular the great advantage that they take into account the fact that data has a longer lifespan than the research project that creates them. DMP are conceived and applied in order to ensure that data will be preserved and useful both now and in the future, for both their creators and their reusers. Besides, in order to support open access for research data, several funders make data sharing mandatory, and their applicants must thus provide a data management plan to do so.

Data Management Plans are not simply management tools at project level, they also allow a broader reflexion on research data in the Humanities on a larger scale. Although they can apply to any data for any research field, we have chosen to make their benefit easier to grasp by addressing one specific use case – but a use case that applies to a wide range of Humanities research projects. The focus here will be on the reuse, for research purposes, of data emanating from Cultural Heritage Institutions. In this specific situation, there is often a lack of a clear policy on interactions between institutions and scholars. Therefore researchers encounter difficulties to develop a clear data management policy for their research projects in connection with Cultural Heritage data.

The Cultural Heritage Data Reuse Charter we are currently developing in the context of DARIAH-EU and other research infrastructures (Humanities at Scale and IPERION-CH. Ongoing discussions with EUROPEANA and CLARIN) tackles this issue by offering an online environment dedicated to all actors taking part in scholarly reuse of digital data generated by Cultural Heritage Institutions. The Charter online environment allows the main actors to declare general principles (common work ethics), and more broadly to express their position on all the relevant information needed to understand how a given dataset can be reused. Institutions can declare their collections; researchers their research interests and existing publications so that these are connected together. The Charter can also help document the knowledge generation process and, consequently, increase the quality of data and metadata accessible to research. Signing the Charter also implies making a statement about the technical quality of the data to be reused, or the data derived by such a reuse. More broadly, the Charter offers a concrete implementation framework for the FAIR principles (make the data findable, accessible, interoperable and reusable). Finally, clarifying the reuse conditions of cultural heritage data, and by that also the relationships between scholars and GLAM institutions, enable to widen the cooperation opportunities.

Within the framework of the Data Reuse Charter, CHI and scholars can explicit their constraints concerning data reuse. The Charter will not only allow CHI to clarify their policy on data reuse and enable researchers to have a precise overview of their rights, it will also allow CHI and researchers to handle easily the digital data they produced and therefore help them to define their strategy on data management. In other words, the Charter strongly connects with data management planning, whose main goal consists of clearly stating the data policy of a research project, and will be an essential asset for data management planning for research on Cultural Heritage.

Workshop provisional program

We expect the workshop to last about three hours. Detailed presentations will be accompanied by open discussion, where we would like to take advantage of the presence of DH researchers and representatives of Cultural Heritage Institutions to engage in a fruitful exchange.

The workshop will be divided in three sub-sessions:

Data management for researchers : overview and challenges (presentation: 45 mn)

In this session, we will discuss the new model of data sharing that is actually emerging as described above. Participants will also get an overview of research data management and data management planning. Data management can offer many advantages, like higher quality data, increased visibility and better citation rate. In this approach, research data is an asset and a resource that can be shared with mutual benefits for the person who share the data and the one who collect the data.

Sharing research data: hindrances and benefits (presentation and discussion with the audience : 1h)

Sharing their research data allows the researchers to organise and retrieve them effectively, to ensure their security, to collaborate with fellow researchers within the same discipline or from other disciplines, to reduce costs by avoiding duplication of data collection, to make easier validation of results, to increase the impact and visibility of their research outputs. Many are still reluctant to share their data, but, fortunately, data sharing is gradually evolving towards a greater openness with the movement for Open Science and the development of Open Access. However, researchers need to be aware of the benefits of sharing their research data, because sharing (or not) rests most of the time on the shoulders of the researchers who decide whether and how to share their data.

Framework for exchanging information about data reuse : The Charter online environment (presentation and discussion with the audience: 1h)

In this session, we will present the online environment offered by the Data Reuse Charter, and its connection with data management planning. By taking into account the longer lifespan of Cultural Heritage data, this tool offers many valuable elements (e.g. documentation, guidelines, list of services) that could be used to easily create data management plans:

Long-term and persistent access to metadata, texts, images;

Licensing of the content;

Formats and standards;

Dissemination of both CHI information and research (visibility of the work of all stakeholders);

Retro-provision (communicating enrichments based on CHI data to the CHI they originally emanate from);

Quality control at all levels according to appropriate standards.

Approximate number of participants: 20.

Organizers

Marie Puren and Charles Riondet, Ph.D., are junior researchers in Digital Humanities at the French Institute for Research in Computer Science and Automation (INRIA) in Paris. They currently work on the creation of a Data Management Plan for the PARTHENOS H2020 project. Marie Puren also contributes to the IPERION H2020 project, especially by upgrading its Data Management Plan. Charles Riondet is also involved in H2020 EHRI project as a metadata and standards specialist.

Transkribus: Handwritten Text Recognition technology for historical documents

Register to:  louise.seaward@ucl.ac.uk

Transkribus (https://transkribus.eu/Transkribus/) is a platform for the automated recognition, transcription and searching of handwritten historical documents. Transkribus is part of the EU-funded Recognition and Enrichment of Archival Documents (READ) (http://read.transkribus.eu/) project. The core mission of the READ project is to make archival material more accessible through the development and dissemination of Handwritten Text Recognition (HTR) and other cutting-edge technologies.

The workshop is aimed at researchers and students who are interested in the transcription, searching and publishing of historical documents. It will introduce participants to the technology behind the READ project and demonstrate the Transkribus transcription platform. Our team has already conducted over 20 similar workshops over the course of the past year, including several sessions with digital humanities scholars and students.

Transkribus can be freely downloaded from the Transkribus website. Participants will be instructed to create a Transkribus account and install Transkribus on their laptops in advance of the workshop. They will also be asked to upload a few images of historical documents to Transkribus prior to the session. They should bring their laptops along to the workshop.

The workshop will consist of four parts:

1. Introduction to Handwritten Text Recognition (HTR) technology (20 mins)

The introduction to this workshop will explain how new algorithms and technologies are making it possible for computer software to process handwritten text. Handwritten Text Recognition (HTR) technology works differently from Optical Character Recognition (OCR) for printed texts (Leifert et al., 2016). Rather than focusing on individual characters, HTR engines process the entire image of a word or line, scanning it in various directions and then putting this data into a sequence. This introduction will outline the workings of HTR technology and show examples of the successful automatic transcription and searching of historical documents. The latest experiments demonstrate that Transkribus can automatically generate transcripts with a Character Error Rate of 5%. This means that 95% of the characters in the transcript would be correct.

2. Overview of the READ project (20 mins)

This presentation will give an overview of the READ project and the specific tools it is creating. Computer scientists working on READ are developing HTR technology using thousands of manuscript pages with varying dates, styles, languages and layouts. Testing the technology on a large and diverse data set will make it possible for computers to automatically transcribe and search any kind of handwritten document, from the Middle Ages to the present day, from old Greek to modern English. This research has huge implications for the accessibility of the written records of human history. The READ project is making this technology available through the Transkribus platform but also developing other tools designed to make it easier for archivists, researchers and the public to work with historical documents. The workshop leaders will present prototypes of some of these tools. These include a system of automatic writer identification, an e-learning app to enable users to train themselves to read a particular style of writing, a mobile app to allow users to digitise and process documents in the archives and a crowdsourcing platform where volunteers can transcribe with the assistance of HTR technology. These tools will be open source and are designed to be used and adapted by other institutions and projects.

3. Introduction to Transkribus (20 mins)

HTR technology is made available through the Transkribus platform, which is programmed with JAVA and SWT (Mühlberger et al.) A transcription of a handwritten document can be undertaken in Transkribus for two main purposes. The first is a simple transcription – this allows users to train the HTR engine to automatically read historical papers. The second is an advanced transcription – this allows users to create a transcription of a document which may serve as the basis of a digital edition. This presentation will explain both uses of Transkribus.

HTR engines are based on algorithms of machine learning. The technology needs to be trained by being shown examples of at least 30 pages of transcribed material. This helps it to understand the patterns which make up words and characters. This training material is known as ‘ground truth’ (Zagoris et al., 2012, Gatos et al., 2014). The workshop leaders will demonstrate how ‘ground truth’ training data can be prepared using Transkribus.

Transkribus can also be used simply for transcription. This presentation will explain how to create a rich transcription of a document in the platform, using structural mark-up, tagging, document metadata and an editorial declaration.

4. Working independently with Transkribus (2 hours)

In the last part of the workshop, the participants will be able to try out the functions of Transkribus on their own laptops. They will be supported by the workshop leaders who will explain the different elements of the platform and then give participants the chance to practice each function for themselves. The workshop leaders will circulate around the room to answer any questions.

The workshop leaders will demonstrate the following tasks. After each demonstration, participants will be given 10-15 minutes to practice what they have learned.

– Document management – how to upload, view, save, move and export documents in standard formats (PDF, TEI, docx, PAGE XML)

– User management – how to allow specific users to view and edit documents

– Layout analysis – how to segment your documents to create training data for the HTR engines

– Transcription – how to create a rich transcript with tags and mark-up

– HTR – how to apply HTR models to automatically generate transcripts, how to conduct a keyword search of your documents, how to assess the accuracy of automatically generated transcripts

The workshop will close with a Question and Answer session where participants can clarify anything they are unsure about. They will also have the opportunity to provide feedback on the Transkribus tool via our user survey.

Number of participants: 15
Participants will need to bring their own laptops on and install Transkribus (https://transkribus.eu/Transkribus/) before attending the workshop.

Higher Education Programs in Digital Humanities: Challenges and Perspectives

Register to: koraljka.golub@lnu.se

Introduction
Different aspects related to higher education programs in Digital Humanities (DH), whether, what and how they should be organized, are currently discussed at many higher education institutions in Nordic countries and beyond. In recent years the establishment of new educational programs under the title of Digital Humanities, for example in the USA, UK and Germany, are an indication of a perceived need for developing such specific curricula. DARIAH-EU has a dedicated research and education centre under the title of Virtual Competency Centre (VCC) Research and Education Liaison (http://www.dariah.eu/activities/research-and-education.html). DARIAH-EU also runs a registry of Digital Humanities education in Europe (http://dh-registry.de.dariah.eu) which, as of 10 January 2017, lists currently active 17 Bachelor degrees, 38 Master degrees, and 8 individual courses. The University of Stuttgart and the University of Trier are just two examples that run programs under the actual title of Digital Humanities. Similarly, EADH (European Association for Digital Humanities) provides a list of education programs, courses and seminars in Europe (http://eadh.org/education) and names: 7 undergraduate programs and courses, all with terms like Digital Humanities, Humanities Computing and related in the title; 20 postgraduate ones with a more mixed array of titles; and, 4 PhD programs, all with the title of Digital Humanities or very similar (University College London, King’s College London, a cluster of 4 Irish universities, and University of Passau).

In the Nordic countries similar efforts are underway at the University of Gothenburg (http://lir.gu.se/english/education/masters-second-cycle/master-s-programme-in-digital- humanities), which is launching a Master in Digital Humanities in autumn 2017. The University of Helsinki (https://www.helsinki.fi/en/researchgroups/helsinki-digital-humanities) is also offering a set of courses in Digital Humanities. Linnaeus University (https://lnu.se/digihum/) aims towards developing an international distance Master program in Digital Humanities, with a pilot program starting in the autumn of 2017. At the same time, at other universities, courses in digital methods and topics have been integrated as a part of existing and new programs as specific compulsory and elective modules, or by including Digital Humanities related topics and perspectives as a part of other courses.

However, what a dedicated course, module or program in the field of Digital Humanities should cover is not always clear. There is a considerable variation between different offerings including diverse content and approaches. The vast range of disciplines, fields, areas and topics relevant to Digital Humanities present a challenge as to what to include in a dedicated program, how to address the different challenges related to bringing together different disciplinary traditions and methods, and how to accommodate professional, infrastructural and academic requirements for such initiatives. Moreover, there are several challenges associated with what is expected from the outcomes of these new educational programs and efforts. Which job positions and tasks could a graduate Digital Humanist take on after completion of a Digital Humanities program? Is there a need for Digital Humanists as such or should education in all humanities subjects be more inclusive of digital technology-related, cross-disciplinary and cross-sectorial topics? If the latter is the case, do we need entire programs or could the alternative of focusing on dedicated modules or individual courses address existing and emerging needs of both the academic and the non-academic spheres? Furthermore, if both approaches were deemed to have their merits, how do they differ, overlap and complement each other in the context of educating future researchers and professionals for different sectors of the society?

The aim of this proposed workshop at DHN 2017 is to bring together scholars, educators and others interested in different aspects of Digital Humanities education to explore the current potential and challenges and opportunities related to the teaching and learning of Digital Humanities. The workshop will provide an opportunity to share experiences, discuss existing programs, modules and courses in Digital Humanities, research and development activities, evaluation approaches, lessons learned, and findings. A further objective is to systematically engage in discussions in common areas of interest with selected related communities and to investigate potential co-operation and concrete collaborative activities.

The workshop will allow major established programs and initiatives to report results, newcomers to interact with established people in the field in order to allow the entire community to critically discuss topical issues. The DHN venue encourages participation by Digital Humanities teachers, researchers and developers from different perspectives (reflecting the different conference threads). As the first workshop on education at DHN, it may set the path for future workshops at the annual DHN conferences in order to establish and provide a regular forum for discussions on education in Digital Humanities in Nordic countries and beyond.

Workshop themes
The proposed workshop will have three themes as the main focus, together with topical presentations arising from the workshop CfP. The main themes are enumerated below:

  1. Existing programs, modules or individual courses in Digital Humanities: design, target student groups, content, job market, evaluation, experiences and lessons learned.
  2. Currently developed programs, modules or individual courses in Digital Humanities: approaches to the design, target student groups and related issues.
  3. Cross-disciplinary and cross-sectorial collaboration in Digital Humanities education.

Workshop structure
Indicative agenda structure, covering approximately 4 hours:

Session 1: Welcome, introduction, mutual presentations (30 min duration)
Session 2: Presentations on the main themes (90 min duration)
Session 3: Directed discussion emerging from the main session 30 min duration) Session 4: Presentation and discussion of submitted papers on timely and related topics according to the CfP (60 min duration)

Session 5: Concluding discussion, including options for co-operation (30 min duration).

Audience
The intended audience includes: teachers and managers at existing and developing Digital Humanities programs; researchers working with topics in Digital Humanities education; professionals who are interested in taking a Digital Humanities program, modules, or courses.

Number of participants: 20

Organizers

Jenny Bergenmar
Department of Literature, History of Ideas and Religion, University of Gothenburg, Sweden
jenny.bergenmar@lir.gu.se

Koraljka Golub (primary contact)
Department of Cultural Sciences, Linnaeus University, Sweden
koraljka.golub@lnu.se

Isto Huvila
Department of Archival Science, Library & Information Science and Museum & Heritage Studies, Uppsala University, Sweden
isto.huvila@abm.uu.se

Marcelo Milrad
Department of Media Technology, Linnaeus University, Sweden
marcelo.milrad@lnu.se

Mikko Tolonen
Helsinki Collegium of Advanced Studies University of Helsinki
mikko.tolonen@helsinki.fi