Data Privacy, What Still Need Consideration in Online Application System?

This paper aims to conduct an analysis and exploration of matters that still needs to be considered in relation to data privacy in the online application system. This research is still a preliminary study. We conduct research related to data privacy using systematic literature review approach (SLR). By using SLR stages, we made a synthesis of 44 publications from Scopus Database Online that were released in the range 2015 - 2019. Based on this study, we found six points to consider in data privacy, namely security and data protection, user awareness, risk management, control setting, ethics, and transparency.


Introduction
Data at this time has been considered as a valuable and important asset (Baillie et al. 1994;Reinsel et al. 2018;Tapsell et al. 2018). This is because data can be a basis for strategic business decision making and can also provide insight in finding business opportunities (Reinsel et al. 2018). Data asset here includes data that is personal or data that can be associated or attached to someone (Lopes and Quaresma 2016;Tapsell et al. 2018).
Personal data is now more easily obtained by certain parties with the rise of social media and online application systems such as marketplaces, online transportation, and online loan services (Klukovich et al. 2016;Mostafa et al. 2017). Thus personal data is more prone to be misused (Schuppler et al. 2018;Shabtai et al. 2012). For this reason, there appears an emphasis on the term data privacy in which a person has the right to reject and close information attached to him (Korže and Čertanec 2017).
On the surface of the community it is as if deliberately sharing personal data because they think personal data is safe both on social media and on the online application system they use. In fact, this personal data is often misused by individuals who are traded to those who use it, among others, to examine consumer behavior, influence one's political direction, design political campaigns, to criminal acts such as credit card burglary and extortion (Shabtai et al. 2012).
The importance of protecting personal data is increasingly echoed after the scandal that befell Facebook with the sale of user data to Cambridge Analytica (Isaak and Hanna 2018;Srivastava and Geethakumari 2016). In Indonesia, data stored online is also widely misused, especially in the case of online loans and other cases relating to financial services (Majumdar et al. 2018). The case involving Cambridge Analytica and Facebook in 2018 did shake the world. The complete picture of the case is documented in full in the film "The Great Hack". In the documentary that the use of millions of users' personal data has been going on for years and only then revealed (Livemint 2019). Cases of misuse of personal data of its users apparently also involve Google and Twitter (Curran 2018;TechSpot 2019).
Meanwhile, awareness of personal data has begun to be intensified, including the presence of data privacy days or data protection days commemorated every January 28. The existence of this anniversary encourages awareness of the importance of data protection, including personal data, both by the institution and each individual (Vervier et al. 2017). Awareness of the importance of protecting personal data is widespread. Sharing personal data or vice versa, refusing to share data is privacy for everyone, including in the cyber world.
The important role of personal data is also recognized by the Indonesian government. Moreover, every year there are cases of misuse of personal data reported by the public. In 2019 there was a lot of news about Civil Registry Office's data leaks in the form of residence identification numbers and ID numbers. This data is said to be traded, some are used for extortion, although later the Civil Registry Office's dismissed the issue (Sekretariat Kabinet Republik Indonesia 2019). Based on Legal Aid, there are three thousand reports of data misuse by online lending institutions in Indonesia (Katadata 2019). Not to mention other cases of violations in other places and those that have not been reported. Personal data that are generally misused include telephone numbers, identity cards, ID numbers, and data on credit/banking cards (Hukum Online 2018).
Because we consider data protection to be important, therefore, in this study there is one thing that we want to explore with the explanatory method by taking a systematic literature approach. We ask one question, which is, "What are the things that still need to be considered in the activities of protecting personal data in an online application system, relation to data privacy?" The output of this research is any area that needs to be considered in relation to data privacy, especially with regard to personal data.

Literature Study
What is data privacy? Privacy according to Merriam-Webster is something whose use is intended to be limited to only certain people or groups. In relation to technological developments, the term and scope of privacy are also widespread. Now, privacy is not only about something physical and action, but also in the form of information or data. This privacy data has to do with privacy that is defined by Westin (1967), namely the demands of individuals, groups, or institutions to do and determine their own how, when, and to what extent information about them is communicated to other parties. Seeing from this definition, data privacy is related to access rights and control of information (Mai 2016).
General Data Protection Regulation (GDPR), the regulation of European Union law on data protection and privacy which is often a reference in data privacy and protection of personal data, provides a definition of data privacy as the freedom granted to make their own decisions about who can process data them and for what purpose (GDPR.EU 2019).
The definition of data privacy is generally associated with personal information that can characterize an individual (Mai 2016). The type of data that is of concern in terms of their use or relating to data privacy is personal data. Therefore, privacy data is closely related to personal data. Personal data is defined as all information that has a connection with identity or as natural can identify someone either directly or indirectly (Klosek 2000). This definition is similar to that stated in GDPR, personal data is any information relating to someone that can be identified directly or indirectly (European Parliament and of the Council 2016). Whereas in Indonesia, personal data based on the Population Administration Law is certain personal data that is stored, maintained, and kept truthful and protected by confidentiality (DPR 2006).
Personal data is based on the level of confidentiality and its importance is divided into four, namely insensitive data, sensitive data, quasi-identifiers, and explicit identifiers (Nataraj Venkataramanan 2016). Insensitive data is data that is easily accessed, for example gender. Sensitive data is data that has confidential information about the owner's records, for example health issues, financial status, and income. Whereas Quasi-identifiers are attributes that include demographic, geographical, telephone and e-mail address information. While explicit identifiers are attributes that are attached to someone directly. Examples are name, identity card, insurance ID, and social security number, driver license.
According to GDPR personal data includes names and email addresses, location information, ethnicity, gender, biometric data, religious beliefs, web cookies, and opinions. Pseudonym data can also be entered into personal data if it is relatively easy to identify someone from it (European Parliament and of the Council 2016). Whereas in Indonesia, personal data that is protected under the Population Administration Act includes family card numbers, employment numbers; ID number; date/month/year of birth; information on physical and/or mental disability; parent identification number; and some notes on important events (DPR 2006).
The discovery of the internet caused the issue of data privacy to be important. Especially with the internet access gadget. The internet is referred to as a source of information about individuals (Klosek 2000). This data is collected by the internet through surveys, cookies, pages that need to be registered and so on (Klosek 2000). At this time information is also being collected by various mobile applications embedded in the device. For this reason, protection of personal data is important. GDPR recommends that the personal data be processed legally, fairly and transparently, collected with the stated, explicit and legal purpose. Personal data may only be stored in the long term for archiving and public use, research purposes and statistical purposes (GDPR.EU 2019).
Research relating to personal data has been booming lately in a decade. This is because data privacy is something important related to technological progress. One paper discusses about user's awareness of their personal data in the online system (Hossain and Zhang 2015). They conducted the study by distributing questionnaires to 377 users who were familiar with online social networks (OSN) such as Facebook and Twitter. Based on this research 80 percent considered OSN had not provided adequate privacy controls (Hossain and Zhang 2015). Other studies discuss children's online privacy (Minkus et al. 2015). Some parents volunteered to share their children's data, even if for example the only viewing arrangement was arranged by a close friend. This is risky because crime against children is currently increasing, parents also need to protect their children's personal data (Minkus et al. 2015).

Research Methodology
In this study the author explores anything that still needs to be considered in relation to data privacy and protection of personal data, especially in relation to the online application system. To answer this problem, the authors conducted a series of research stages. This research methodology began with a systematic literature study of the latest research in the realm of computer science about personal data. Systematic literature review (SLR) is a research method that is widely used. This method is generally used in the fields of health and medicine, as well as in science. But later its use expanded, including in the field of computer science. The SLR method is suitable for preliminary research to find out about trends in a particular discipline, to clarify preliminary research, and to identify and interpret the state of the art on a topic (Kitchenham and Brereton 2013).
SLR can also be referred to as a form of secondary study by conducting a series of activities, beginning with the identification, analysis, and interpretation of all the evidence that has been obtained related to certain research questions that are not biased (Kitchenham et al. 2009). The stages can be repeated. This SLR method is suitable for researchers who want to know the current issues raised by researchers in a particular discipline or field, and aim to synthesize, avoid subjectivity, and bias (Kitchenham and Brereton 2013).
We chose to use the SLR approach to find current issues about data privacy. We hope to find a state of the art about research related to data privacy at this time. For this reason, we used the term 'data privacy' when searching using SLRs. We initially found 72,713 documents on Scopus electronic database. This research began to climb in 2004 with 1.409 publications and continued to grow with 2.184 publications in 2007. Furthermore, research in this field had decreased before then returned to be interesting to study. The topic of privacy data again boomed in 2010 with 2.584 publications, then continued to climb until its peak reached 9.007 in 2019. The trends and figures on privacy data can be seen in the Figure 1.

Figure 1. The Number of Publication on Data Privacy in Scopus Based By Year
We then carried out the screening process by providing several criteria and limits. We filtered based on the criteria for inclusion and exclusion as shown in the Table 1. We chose the latest publication, which ranges from 2015 to 2020. We also chose publications that fall into the field of computer science that written in English. We also limited the publication of results to a paper conference or journal. We also made restrictions so that the paper discusses more about the use of data privacy in the online system. From these limits we found 195 publications.

Inclusion Criteria Exclusion Critera
In the discipline area of Computer Science Outside of Computer Science Data privacy in online application system Not relevant, for example only discuss the meaning of data privacy 2015 -2020 range Duplicated publications Written in English language Using any language Type publication are papers, journals, or books review Publications cannot be accessed by online We then read the title of publication one by one. But apparently there were still some lecture notes that went into it. From the title screening, we found 141 publication, then we conducted abstract screening. We collected 96 suitable publications from this process. Next, we carried out final screening process by reading in full paper. We found 44 papers, then we made the summary paper one by one, we also did the synthesis and categorization. The stages in this SLR can be seen in the Figure 2. The topics discussed in this publication about data privacy vary, but still have a common thread with the online system. Online technologies related to data privacy are mobile and web applications, applications that use cloud computing, online system networks, also big data. Part of this publication discusses the online system network (OSN) because OSN such as Facebook, Twitter, Instagram, and LinkedIn, have many loyal users using it.
Other topics aside from discussing online technologies are related to the topics of risk and risk mitigation; data security protection; access control policy; transparency of online service providers, user awareness, and the ethics of those who store and use personal data. One publication can discuss more than one issue, for example the level of user awareness regarding their personal data and their expectations of online service provider transparency. After the 44 publications are sorted into various categories above, we then synthesize them. The topic categories and summaries can be seen in Table 2.

A. Data Privacy Related to Online System Technology
Online system technology continues to grow, with technological advances and faster internet access. Currently various applications can also be accessed with mobile technology and big data technology is increasingly being used.

1) Online system network
Online system network (OSN) is used by millions of users from various countries. Users vary from teenagers to adults, not even children. It is fun to socialize with OSN, but there are risks lurking about the user's personal data (Hossain and Zhang 2015 Their worries increase when there are cases of data misuse by several OSN (Shinjo et al. 2016;Umair et al. 2017). On the one hand this opens the awareness of users about the importance of reading the OSN privacy policy and regulating who can see their personal data (Albertini et al. 2017;Costa 2016;Hossain and Zhang 2015;Ilia et al. 2017;Klukovich et al. 2016;Minkus et al. 2015;Petkos et al. 2015;Tsirtsis et al. 2016;Van Der Valk et al. 2016). On the other hand they hope OSN is also law-abiding and transparent in the matter of the use of personal data of its users (Hossain and Zhang 2015;Mostafa et al. 2017;Polakis et al. 2016).

2) Mobile and web based application
At this time web-based applications can also generally be accessed by using a mobile device. These applications range from game applications, education, shopping for children's games, health, health insurance, and so on (Hölzl et al. 2016;Thao et al. 2018;Yee 2017a;Zhang et al. 2016Zhang et al. , 2015. ). Some applications also have features to share with OSN. With the increasing variety of applications embedded in the device, concerns have arisen that the user's data remains stored and then used as material to examine user habits, sell them to third parties, or be tapped to commit criminal acts (Aktypi et al. 2017;Alsalamah 2017;Hung et al. 2016;Leung et al. 2016;Yildirim and Varol 2019).
One of which was discussed is the Fitbit application, users of this application start to worry if the membership of a community increases ). They are wary when conversations or historical data on the application are misused . There is also research that discusses the importance of protecting one's medical record data when using health insurance applications because it is sensitive data (Zhang et al. 2016). Children's personal data are also prone to be misused when he interacts with smart toys, or online games (Hung et al. 2016).

3) Cloud computing
At present the use of cloud computing more and more with the convenience offered. There are many services that use cloud technology, such as e-voting (Grewal et al. 2015;Sedky and Hamed 2015). Of course this technology has risks because the stored data can be accessed by unauthorized parties (Mijuskovic and Ferati 2016).

4) Big data
At this time many companies are using big data technology for the benefit of companies, both those that are for the public interest, or for economic purposes. The use of personal data is actually something that must be in accordance with the guidelines, relating to the rules of data privacy (Vervier et al. 2017). Lately, there have been many cases of student data being used for educational data mining purposes, where the data is entered in sensitive data (Barril and Tan 2017). Another sensitive big data issue is research using patient data and medical records (Purandhar and Saravana Kumar 2019).

B. Data Privacy Related to Risk
With the increasing dependence of the community on the internet, intentionally or unintentionally their personal data is vulnerable exposed (Aktypi et al. 2017;Burbach et al. 2018;Hossain and Zhang 2015;Kumar et al. 2017;Pirzada et al. 2019;Purandhar and Saravana Kumar 2019;Symeonidis et al. 2016). The risk of exposure to this personal data from cases of buying and selling of their data, fraud, ID and password theft, social engineering attacks, SQL injection attacks, XSS attacks, fake friend profiles, recommendation systems, etc. (Alsalamah 2017;Hölzl et al. 2016;Hung et al. 2016;Leung et al. 2016;Luma et al. 2019;Malloy et al. 2017;Nalinipriya and Asswini 2016;Nandhini and Das 2016;Pirzada et al. 2019;Tsirtsis et al. 2016;Yildirim and Varol 2019). For this reason it is important to do mitigation to minimize the risk of private or sensitive data (Yee 2017a).

C. Data Privacy Related to Data Protection
Protection of personal data that is spread across the online system when it is important that it is not misused (Yee 2017b). This personal data can be in the form of names and inherent attributes including health records, web search records, location, conversation data, and sound cards. Now various data protection technologies are present. One data protection model is to blur the data. The process of blurring this data can be done by encryption (Grewal et al. 2015;Klukovich et al. 2016;Kulal and Dhamdhere 2017;Sedky and Hamed 2015), data masking (Degadwala and Gaur 2017) or by anonymization techniques (Srivastava and Geethakumari 2016;Thao et al. 2018;Zhang et al. 2016). In the process of obscuring data with anonymization techniques, not only anonymized data, nodes and attributes in an OSN graph also need to be anonymized (Srivastava and Geethakumari 2016). In masking techniques can also be done on the data attributes in the form of images (Degadwala and Gaur 2017).
Other data blurring techniques are data sanitization. This system uses the substitution method to clear keywords. Because nouns and verbs provide the most information in a sentence, they will be treated as keywords and the rest of the words will be treated as function words. Keywords will be sanitized using Stanford natural language processing (Tambe and Vora 2017).
Other data protection proposals by using decentralized social networking services use virtual private networks so that data is controlled and does not leave a group (Shinjo et al. 2016). It is also proposed to conduct periodic Fraud Assessment and Detection, for example by regularly checking and verifying fake links and fake friend profiles (Nandhini and Das 2016;Tsirtsis et al. 2016), measuring user exposure through periodic privacy exposure metrics (Masoumzadeh and Cortese 2017), using a data security algorithm for securing personal data pribadi (Pirzada et al. 2019), or using the Logic Rule Generation algorithm to be able to find and analyze the nature of user vulnerabilities (Revathi and Suriakala 2018).

D. Data Privacy Related to Access Control Regulation and Setting Control
Data privacy is related to controls. Users may refuse or grant access to their personal data. For this reason, each OSN must provide control arrangements for who can see the user's personal data and what data can be seen (Albertini et al. 2017;Hossain and Zhang 2015;Klukovich et al. 2016). When this feature is available, users feel safer when sharing information (Van Der Valk et al. 2016). A study proposes a collaborative multi-party access control model that allows all users associated with these resources to participate in access control policy specifications (Ilia et al. 2017).

E. Data Privacy Related to Transparency
Regarding data privacy, some OSN users consider OSN to be not transparent in their data usage policies (Hossain and Zhang 2015). This mistrust is triggered by the many cases of data breaches by selling user data to third parties and various cases that threaten users. For this reason, users expect that there are terms and conditions that mention data privacy and if data is used by the OSN provider (Mostafa et al. 2017).

F. Data Privacy Related to Ethics
The use of personal data under the GDPR is permitted if intended for public purposes and for statistical purposes required by the region or country. However, there are various other provisions relating to research ethics, such as maintaining data confidentiality, respecting privacy, not selling or sharing it with other parties, and so on (Polakis et al. 2016).

G. Data Privacy Related to User Awareness
With many cases of violations in the use of user data, users are increasingly aware of the threatening risks when they give too much personal data to the public. Most users begin to be aware and aware of the attributes of their data that enter sensitive data (Hossain and Zhang 2015;Mijuskovic and Ferati 2016;Umair et al. 2017;Zhang et al. 2015). However, sometimes parents forget that they are neglectful of the privacy of their children's data (Minkus et al. 2015;Tsirtsis et al. 2016). Although for example the visibility has been set for only close people but the child's data remains something risky (Minkus et al. 2015). Some are actually aware of privacy issues, but they then voluntarily share them with certain rewards (Vervier et al. 2017). Because user awareness is important, a study proposes a framework for measuring privacy awareness in three dimensions, namely visibility, level of control, and privacy score (Petkos et al. 2015).  (Albertini et al. 2017;Hossain and Zhang 2015;Ilia et al. 2017;Klukovich et al. 2016;Kulal and Dhamdhere 2017;Kumar et al. 2017;Luma et al. 2019;Masoumzadeh and Cortese 2017;Minkus et al. 2015;Mostafa et al. 2017;Nalinipriya and Asswini 2016;Nandhini and Das 2016;Petkos et al. 2015;Polakis et al. 2016;Revathi and Suriakala 2018;Shinjo et al. 2016;Srivastava and Geethakumari 2016;Symeonidis et al. 2016;Tambe and Vora 2017;Tsirtsis et al. 2016;Umair et al. 2017;Van Der Valk et al. 2016

) • Mobile and Web
Based Application With the increasing variety of applications, including those embedded in the device, there is a concern that the user's data remains stored and then used as material to examine user habits, sell them to third parties, or be tapped to commit criminal acts.

Ethics
The use of personal data is permitted if it is intended for public purposes and for statistical purposes required by the region or country. Related to research ethics, such as maintaining data confidentiality, respecting privacy, not selling or sharing it with other parties, and so on.
Ethics (Polakis et al. 2016) User Awareness With many cases of violations in the use of user data, users are increasingly aware of the threatening risks when they give too much personal data to the public. There are also those who are aware of privacy issues, but they then voluntarily share them with certain rewards.
User awareness (Hossain and Zhang 2015;Mijuskovic and Ferati 2016;Minkus et al. 2015;Petkos et al. 2015;Tsirtsis et al. 2016;Umair et al. 2017;Vervier et al. 2017;Zhang et al. 2015) Based on the information in Table 2, there are several key points that are often reviewed in each topic. Key points that are widely reviewed are about risk and mitigation, security and data protection, user awareness, control settings and access control, transparency, fraud detection, and ethics. Because there are several terms that are similar and can be combined, we propose six key points that still need to be considered in maintaining data privacy when using online applications. The key points are security and data protection, user awareness, control settings, risk management, transparency, and ethics as in Figure  3. Fraud assessment and detection can be a part of risk management and security and data protection. While, access control of data is included in the control setting.

Conclusion and Future Work
Until now there are still many cases of misuse of personal data in the online system. Several cases were revealed that increased user awareness of the importance of protecting personal data. The also demanded service providers to respect privacy data.
Based on research using a systematic review, we found 44 publications (2014-2019) that discussed data privacy. After we categorized and synthesized them, we found six key points that must be considered when using an online application system related to data privacy. These six points are security and data protection, user awareness, control settings, risk management, transparency, and ethics.

User Awareness
Control Setting

Risk Management
Transparency Ethics