Ethical conditions of work with social data
There are numerous codes of ethics and sets of standards that apply to empirical social research, e.g. following:
- ICC/ESOMAR International Code on Market and Social Research
- World Association for Public Opinion Research (WAPOR) Code of Ethics
The main ethical requirements of data management, beyond the general requirements of the quality of scientific work, can be summarised as follows:
- Respondents should be protected from the potential harmful effects of research even after the stage of field data collection has been concluded, in particular whenever the data are worked with, archived, made available, or made subject to secondary analysis. In general, information of an individual nature about survey participants and other personal data is confidential, and this confidentiality should be maintained. Special attention should be paid to sensitive information.
- Respondents must be treated with respect and have the right to know the purpose and methods of utilisation of the information they provide and to decide about the ways it can be utilised. Consequently, their decisions must be respected.
- Adequate utilisation of the information gathered in line with the purpose defined should always be ensured, not only to fructify the efforts respondents made to participate in the research study. Data gathered with public funding must be utilised as much as possible and, whenever the nature of the data allows, made available to the broader scientific community.
Personal data protection
The issues of personal data protection should be given adequate attention as early as the stage in which a research proposal is drafted. To underestimate them would not only constitute a violation of research ethics but might also restrict or completely prevent the researchers’ intentions from being fulfilled and in particular the data from being made available for secondary research. The following should be clear from the beginning:
- Is it necessary to obtain respondents’ informed consent for personal data operations?
- Will the data have to be anonymised?
A simple yes/no answer to these questions is insufficient; additional details are important. We need to exactly identify the phases of the research and the data life cycle in which the presence of personal information on respondents is unavoidable. Then we should plan our data management in such a way that it avoids any unnecessary operations with personal data and institutes adequate personal data protection measures where such operations cannot be avoided.
The following overview is based on Czech legislation:
- Act No. 101/2000 Coll., Consolidated version of the Personal Data Protection Act. Translation published on the website of the Office for Personal Data Protection
On the one hand, legal regulation in European countries is to some extent similar because it is based on a common directive of the European Union. On the other hand, there are significant differences. For example, the Czech Republic does not apply specific rules to personal data processing for scientific purposes and its laws in this respect are among the strictest.
- Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data
- Consider that in 2012 the European Commission proposed a new EU General Data Protection Regulation that should supersede the Data Protection Directive. The rules of personal data protection in research are considerably more stringent in this proposal, but in some aspects also closer to current Czech law. (CESSDA: Individual Privacy Rights Strengthened – Research Possibilities Restricted)
Personal, sensitive and anonymous data
A general rule for any research study based on the collection of information from respondents is to obtain their informed consent. The data subject must at least be informed properly and in advance about the purpose of the data processing, the scope of the personal data, the name of the processor, and the time period the consent is given for. When it comes to so-called sensitive data, in practice consent must be obtained in writing and must preferably also signed by the respondent to demonstrate the existence of consent as required. The respondent is also entitled to request further information about the data processing, and if the reasons for which consent was obtained cease to exist, the data processor must stop processing, i.e. must liquidate the data.
Processing of sensitive data
In order to process sensitive data, the data processor must also register their activity with the Office for Personal Data Protection. In other words, any institution planning to implement a research project that includes the processing of sensitive data must have a relevant reason for this kind of activity, have an adequate structure for securing the protection of such data, and must officially register in time, i.e. before the data processing begins.
The implementation of this kind of data management itself entails, of course, additional organisational requirements and expenses. For this reasons, it is necessary to consider carefully which exercises essentially depend on processing personal data and obtain informed consent and implement data protection measures for these exercises. However, even if social research tends to rely on the collection of individual data, it seeks to obtain aggregated information about society. Thus, personal data can often be omitted altogether or at least in some research stages. If this is the case, the data should be collected as anonymous or should be anonymised as soon as circumstances allow. Furthermore, there are also organisational reasons for doing so; informed consent is easier to obtain for a limited time period and a clearly defined purpose than for an extensive research exercise where respondents do not clearly understand the purpose or the consequences for them personally.
For example: While random sampling tends to identify specific addresses, the research study itself can make do without direct identification of households and respondents. Therefore, the dataset does not have to include such direct identifiers. If the database does not include so-called indirect identifiers either, then it is anonymous and no informed consent is required for analysing, archiving or sharing the data. Similarly, in a panel survey, we need to preserve the addresses in order to implement follow-up waves and survey the same units, but not for the analysis itself. Thus, addresses can be kept separately from the data collected. The database of addresses will be treated in line with the Personal Data Protection Act, while the dataset for analysis will remain anonymous.
Direct and indirect identifiers
Another situation frequently arises: at a certain stage, often just for the purposes of data collection or building connections between databases, personal data has to be used, but in all the other stages of the research the personal data of respondents can be omitted – and discarded. The process of discarding identifiers is referred to as anonymisation (see Manage Data during the Research Process).
If the database contains direct or indirect identifiers and cannot be anonymised, we must obtain informed consent from the respondents and count on spending money on personal data protection. If the database is not anonymous, given the topics typical for social research, which often involve sensitive data, in the Czech Republic consent should be given in writing and data protection measures must be more thorough. At the same time, this poses an important barrier to data sharing. The purpose of data processing must be formulated in specific and time-limited terms in the informed consent request. In the Czech Republic, it is impossible to obtain consent to the processing of data for an unlimited time, for an unknown purpose, or for the purpose of sharing it with anybody. As a result, non-anonymous databases are usually not made available in data archives.
Copyright and Intellectual Property Protection (IPR)
Copyright and intellectual property protection is complicated and a thorough treatment of these issues requires professional legal advice. In each research institution such legal advice should result in the creation of ground rules and standard practices for its employees to follow. Nevertheless, each researcher should be aware of at least the basic contexts.
Copyright guides at other data organisations:
- Copyright guidelines at Create & Manage Data Web of the UK Data Archive
- Joint Information Systems Committee (JISC) and Teaching and Learning Technology Programme (TLTP) Copyright Guidelines
In the Czech Republic, intellectual property rights are treated in particular in the Copyright Act, i.e. Act No. 121/2000
- Consolidated version of Act No. 121/2000 on Copyright and Rights Related to Copyright and on Amendment to Certain Acts (the Copyright Act), as amended by Act No. 81/2005, Act No. 61/2006 and Act No. 216/2006. English translation of the Act published on the Web of the Ministry of Culture of the Czech Republic
Copyright covers any works which are the unique outcome of the creative activity of the author and are expressed in any objectively perceivable manner including electronic form, permanent or temporary, irrespective of their scope, purpose or significance.
‘The subject matter of copyright shall be a literary work or any other work of art or a scientific work, which is a unique outcome of the creative activity of the author and is expressed in any objectively perceivable manner including electronic form, permanent or temporary, irrespective of its scope, purpose or significance...’ Czech Republic, Act 2000/121 Coll., Article 2 (1)
‘...A database which by the way of the selection or arrangement of its content is the author’s own intellectual creation, and in which the individual parts are arranged in a systematic or methodical way and are individually accessible by electronic or other means, is a collection of works...’ Czech Republic, Act 2000/121 Coll., Article 2 (2)
>>> A database which, by way of the selection or arrangement of its content, is the author’s own intellectual creation, and in which the individual parts are arranged in a systematic or methodical way and are accessible by electronic or other means, is a collection of works, and as such it is covered by the Copyright Act. Copyright arises when the database is being created. The fact that a database does not bear a ‘copyright’ label does not exclude it from this legal framework.
>>> Copyright protection covers the authors’ work, not the individual facts stated in it. As far as databases are concerned, this means that copyright covers the selection and arrangement of data in a database etc., while its content may not be covered, depending on what exactly the content is. For example, for an in-depth interview, copyright to the recording is held by the researcher, while the rights to the individual statements remain with the informant.
>>> Copyright protects intellectual property from unauthorised distribution, given the potential loss of income and moral damage. The rightholder chooses the ways of disposal of his/her work and decides about its distribution. Nevertheless, copyright is not infringed by anybody who in his or her own work to a justified extent uses excerpts from the work of other authors, or small works in their entirety, for the purposes of the critique or review of such a work or for the purposes of scientific or technical work, or uses the work while teaching for illustrative purposes or in non-commercial scientific research. However, in doing so, it is always necessary to cite the name of the author, the title of the work, and the source.
>>> Through a written license agreement, the author can grant authorisation to use the work, either in specific ways or in all ways of use, and either to a limited or to an unlimited extent. A license can be either exclusive or non-exclusive. In the case of an exclusive license, the author must refrain from further distribution of and from exercising the rights to use the work to which he granted the license.
>>> The copyright belongs to all the authors of the work, for example, the entire research team, and not only the team leader or the project’s principal investigator. The same applies to university research: the rights do not belong to the teacher only but also to the students who participated in organising the research study. However, a person who has contributed to the creation of the work merely by providing assistance or advice of a technical, administrative or expert nature or by providing documentation or technical material, or who merely gave the impulse to create the work is not considered to be a joint author.
>>> Databases are often created in the framework of an employment relationship. As a rule, the employer exercises the author’s economic rights to a work in his or her own name. Economic rights cover the different ways of using the work, e.g. reproduction, distribution, exhibition, lending, or making the work available. The author’s moral rights, e.g. the right to claim authorship, the right to the inviolability of a work (alterations), or the right of supervision over compliance with obligations, remain unaffected.
>>> Thus, authorisations for the secondary use of or access to a database in an archive are often granted by the employer, rather than the authors’ team. In this respect, it is worth mentioning that most students are not employees of their university, which means that economic rights to their works are not transferred to the university in their entirety. In some cases, too, academic institutions transfer economic rights to their employees, especially for the purposes of publication activity; sometimes the scope of these institutional rules includes other outcomes and activities as well, which may affect the regulation of rights to databases.
(The student-author-university relationship is more complex. Schools or school-related or educational establishments have the right to conclude, under the usual terms, a license agreement on the utilisation of a school work. Unless otherwise agreed, the author of a school work may use his work or may grant the license to any other party, unless this contravenes the legitimate interests of the school.)
>>> Databases can also be created and shared in an environment of wide-open collaboration based on free licenses such as Creative Commons. Then, users can not only utilise the database but can also contribute to it, expand it, update it, or make other alterations, subject to license conditions.