Research studies

MIGRATING DATA IN CLOUD WITH IMPROVING DATA INTEGRITY

المركز الديمقراطى العربى8. أغسطس 2022

213 16 دقائق

Prepared by the researcher

Mokhtar Mohammed Mohammed Ali – Associate proof. computer science. – Elimam Elmahadi University –Kosti Sudan

Dr. Mahala Elzain Beraima Ahmed – Assistant proof computer science – White Nile University- Kosti Sudan

Democratic Arab Center

Journal of Afro-Asian Studies : Fourteenth Issue – August 2022

A Periodical International Journal published by the “Democratic Arab Center” Germany – Berlin

Nationales ISSN-Zentrum für Deutschland

ISSN 2628-6475

Journal of Afro-Asian Studies

:To download the pdf version of the research papers, please visit the following link

https://democraticac.de/wp-content/uploads/2022/08/Journal-of-Afro-Asian-Studies-Fourteenth-Issue-%E2%80%93-August-2022.pdf

Abstract

The cloud (cloud computing) was attractive for both companies and originations. Since originations make the ultimate decision to embrace technology and move their services and data to the cloud, there is a greater risk that data will be lost in the process of migrating from traditional physical computing to cloud computing. In this paper, we propose a system that detects and eliminates data loss. Incomplete data transfer or data modification from traditional physical computing to cloud computing during the migration process. This system ensures that the migrated data on the cloud server is an exact replica of the data from which the data is migrated on the physical server. This is achieved through a combination of algorithms that segment, encrypt and compare migrated data with the original data in addition to the 3-way handshake signaling protocol that has never been implemented effectively in the cloud computing migration environment

INTRODUCTION

Cloud computing has been a hot topic in recent years and cloud storage [1] is also a hot topic in industry and academia as one of the key cloud computing services. Cloud storage is data outsourcing service; data owners can transfer data to the cloud server from their local computing systems so that data can be managed and access outsourced data anytime and anywhere. Due to its low price, could storage have attracted more and more companies and individuals. Some cloud storage service providers, such as Amazon’s s3, Microsoft’s Azure, and so on. However, there have been frequent incidents of data leakage in the cloud storage system, which must draw our attention to cloud storage security issues.

The owners must, therefore, ensure that the data is properly stored in the cloud service. Owners host cloud data and don’t know anything about the data stored in the cloud, which led scholars to study cloud data integrity. Many data auditing schemes currently introduce TPA (Third Party Auditing) third parties instead of owners to verify the integrity of data in the cloud, know as public auditing. More and more companies have built their own cloud storage platform using third-party interfaces.

If company competitors can perform file integrity checks, they can get some useful information for business data. For example, the growth rate file can be an index to guess the company’s development. Thus, [5] proposed a new remote data audit scheme using a homomorphic hash function that was compared to the yang [6] scheme and performed better. We propose that yan’s scheme is improved. In yan’s scheme, the generation of the homomorphic key, the generation of data block tags, the calculation for the auditing are all on the user’s side, thus increasing the user’s burden. However, our scheme moves to compute to the introduced private cloud. We are therefore developing a multi-user batch auditing scheme.

The remainder of the paper is as follows. In section 2, we discuss the various approaches to data migration in the literature and the technologies used in different ways. Section 3 deals with different security threats and challenges during the migration process. Section 4 discusses the system model proposed for data migration between the physical server and cloud environment. Finally, we draw the conclusions and future works in section 5.

RELATED WORK

They use the RSA-based hash function to validate remote data integrity, [7] a drawback of which is the need to recover the whole file, which is quite expensive to calculate. [8] First defined the PDP protocol and proposed two PDP schemes (S-PDP, E-PDP) using random sampling and RSA signed homomorphic properties to verify integrity without the need to recover all data. The tags of the challenged files are clustered so that the overhead communication is 0 &1, but neither of these systems supports dynamic remote data operation. Proposed a scalable PDP scheme based on the PDP model to support [9] dynamic operation. But this supports only partial dynamic operations which do not support full insertion. [10] Proposed to support full dynamic data operations with the DPDP (dynamic provable data possession) scheme. To support the full dynamic operation, they introduce the dynamic data structure based on the skip table, but the entire dynamic operating system based on the skip table which is too long in the certification path.

Each authentication also requires a lot of auxiliary information, and the computer and communication costs are high. Suggested also [11] the hash value of each block in the wang scheme is the MHT leaf node value. The joint of the leaf node hash value builds the parent node value and calculates the parent node hash value. Their scheme also has the disadvantage that the update operations recalculate the hash value of the tree root node at all times, resulting in a large computational burden. [15] Dynamic operations support scheme proposed.

Using the index table data structure to perform dynamic update operations. Every time the data is updated, only the index table stored on the TPA needs to be updated, and the yang scheme uses bilinear pairings to protect user data privacy. Optimized the index table to improve the performance of index table operation on the basis of yang’s scheme. At the same time, proposed their own homomorphic hash integrity checking algorithm, which reduces audit time. It’s worth mentioning the [16], [17] and yan programs are based on the homomorphic hash algorithm.

SECURITY CONCERN

Cloud computing has so many advantages that cost; speed, global scalability, productivity, performance, and reliability are not limited to. This has led to a massive migration to cloud computing from traditional computing. With this migration and the interest in cloud computing and its services, cloud security has become an emerging trend in which companies and organizations have to deal. Data migration is a complex and messy process that cloud lead to complete data loss if it was not planned and safety measures were not taken into account.

In this paper, we will focus on the security risk of data loss that is mostly experienced during the migration process from traditional computing to cloud computing and how to prevent it. The loss of data is mainly of two types: data destruction and data corruption. Infrastructure failure and software errors are possible causes of the various data loss types. Although the cloud design has gone a long way, there are still many concerns and many security issues. The two most relevant problems we need to deal with are infrastructure malfunction and software errors.

The cloud computing environment consists of several moving components and these components (structural design components that create a cloud environment) usually do not fit so well. When dealing with cloud computing, we need to consider the following:

Cloud security: is an ongoing problem that increases the popularity and widespread use of cloud computing. Cloud computing becomes the target of malicious attacks with this increase in popularity. Every infrastructure should be controlled through strong policies, as no safe environment exists.
Loss of data: cloud computing offers data storage as one of its features. Users can upload their data to the cloud remotely and access it later. However, data loss could occur and the user can not access his or her data and other services in that case. Healthcare is an excellent example in which data loss can be very expensive.

SYSTEM MODEL: ART STATE

This paper introduces a new model called CIST(cryptographic integration with data segmentation and reliable transmission).the model is used to reduce and eliminate security threats from data loss or data transmission and data interception by a third party(man in the middle) during the cloud migration process from traditional physical computing to cloud computing.

The CIST model performs a number of modulated tasks to protect data loss. The proposed system will break the entire data protection task. During transmission to multiple modules, Figure 1, ensuring that the data to be migrated to the cloud is fully secure and that computer resources are not overused.

Fig 1: Flowchart of the system model

4.1. DATA SEGMENTATION

The data are segmented into smaller units prior to transmission from the local physical hosting server to the cloud hosting server. Segmentation is the process of transferring data packets to smaller chunks for network transmission. For two main reasons, we break the larger data units into smaller units. First; we divided larger data units in order to improve performance. The machine can process the smaller data units more quickly than the larger data units. This reduces computer resource consumption. Second; data packet segmentation improves the security data. The transmission of data in separate chunks makes it difficult for a third party to acquire complete data if it is able to intercept.

Fig2: Flow of segmentation

In order to segment the data, we use the top-down algorithm to perform the segmentation process dynamically. This is also called the divide-and-conquer method. It begins as the main segment with the conditional observation of the entire data unit. In view of all the various possibilities for the initial division, we identify the best placement boundary point (breaking point) for dividing the original data unit into two segments: s1 on the left and s2 on the right.

The breaking point is identified in such a way that the difference s1 and s2 should be maximum. Each s1 and s2 segment is tested to determine the approximation error level. If the approximation error of the results obtained is below the defined threshold, the entire process is stopped and the test segment is accepted by the system. If the approximation error is higher than the defined threshold, two sub-segments shall be further divided into the segment obtained. The approximation-error check is repeated in an identical manner on each of the two subsections using the parent process position and breakpoint (division). The algorithm is repeated until the defined stop criteria in which the number of segments and approximation error is no longer necessary.

In figure.2, the system accepts the original large data as the input on the local server. The data breakpoint is determined. It begins as the main segment with the conditional observation of the entire data unit. In view of all the various possibilities for the initial division, we identify the best placement boundary point (breaking point) for dividing the original data unit into two segments: s1 on the left and s2 on the right. The breaking point is identified in such a way that the difference between s1and s2 should be maximum.

The process of segmentation is initiated. The approximation error is calculated for the right and left segments. If the approximation threshold error for both s1 and s2 segments is passed, the segment will be accepted. If the threshold approximation limit for both right and left segments is exceeded, the segment will be further divided. And the whole process for the sub-segments is repeated. The segmented data is then tested to confirm that all the parameters set have been met. The sections of data are ready and the process ends.

4.2. ENCRYPTION & DECRYPTION

Each segmented data is then encrypted with a cryptographic asymmetric technique. This method involves the use of two keys, both public and private, to encrypt and decrypt. We will use our public key plus encryption to encrypt each data segment before moving to the cloud servers. This ensures that our information is secured during transmission and that anyone who tries to intercept our data cannot obtain meaningful information. We will use the RSA algorithm (Rivest-Shamir-Adleman) in to encrypt the data [14]. This cryptosystem is for public key encryption, which is used primarily to secure sensitive data when transmitted via an unsafe network.

4.3. TRANSMISSION

After segmented data is encrypted with the RSA encryption algorithm, the segments are transmitted randomly to the cloud servers from the local physical servers. The random transmission ensures that no specific pattern is involved in the entire process so that no one can determine the order of the segments transmitted [15]. We will use a three-way handshake technique to transmit data from the local server to create reliable communication between the two servers.

to eliminate the security risk of a third party trying to act as a man in the middle and fool either the local or the cloud server that they are requesting to, we will implement an authentication method to ensure that the receiving server actually responds to the intended server and vice versa. Our authentication method uses a unique ID that is shared only between two servers. Requests and answers will only be processed after the ID has been verified on each server. This ensures that the transferred data cannot be accessed by a third party [16].

The encrypted segments are transmitted by both servers to a given window size [17]. The window size is the number of sections transmitted at once. The sending server starts with the lowest window in one segment, if the receiving server accepts this window, the next window increases by one. When the increase reaches a size not available to the receiving server. If the window does not match, the receiving server requests that the window is transmitted again. The sliding window allows increasing the window size during transmission until the maximum size is reached and the window size starts again when this happens. The receiving server recognizes all segments it receives and retransmits any unrecognized segments.

4.4. COMPARISON

After successful data segment decoding on the cloud server, each and every decoded data segment on the cloud server is compared to the same un-coded data segment on the local physical hosting server. The comparison depends on the number, length, and size of the sequence. If the two segments with the same number of sequences do not match the length and the checksum, the segment is considered corrupted and void. The model will then require that it be re-transmitted.

The comparison used in our model ensures that during the migration process no change, incomplete transmission or duplication of data will pass this stage only after all requirements have been fulfilled. In figure.3, we begin by selecting from the local physical server the left model that represents the unencrypted segment. The right model that represents the decrypted segment is then selected from the cloud server. The length and number of the sequence are checked and compared. If both match, the process will pass and end. If they do not match, the entire process reboots. If the data segment passes this stage successfully, the data segment has been transmitted without data loss, alteration or corruption.

Fig 3: Method of data comparison

The system will then wait to ensure that all the remaining sequence numbers on the cloud server have arrived from the beginning. Any missing sequence is then requested from the sender from the physical hosting server locally, reassembly data reassembly is the combination of several segments of data into a larger unit. This brings all segmented data together with the original data. Vacancy on the destination machine (cloud server) is created and these vacancies are called holes. When a new data segment arrives on the cloud server, it fills one of the vacant holes and occupies it. One is filled in the vacant hole and then removed from the list. The entries of the vacant holes were then checked with the descriptor list to check whether the vacant hole filled by the incoming segment was eliminated. The arrival of each data segment will remove each vacant hole from the list. When all vacant holes are filed out and checked from the list, the databases segments are arranged according to their sequence number.

The algorithm of reassembly is as follows, figure 4. When the new data segment enters, an empty data buffer area is created and a single entry in its descriptor list is inserted. This entry describes the data section as missing altogether. The first vacant hole is 0, whereas the last vacant hole is endless. Infinity is implemented by a very large integer (more than 1024).the insertion of the complete data units into the buffer region for each arriving data segment uses the following steps.

Fig 4: data reassembly process

Choose from the hole descriptor list the next vacant hole descriptor. If no more entries are available, we jump to the eighth step. Jump to the first step 3 if the first data segment is larger than the last vacant hole. If the first data segment is smaller than the first vacant hole, the first step is taken. If the second or third steps are true, the newly introduction data segment will not overlap with a hole. This whole is ignored and is disregarded. We’ll go back to the start and the next hole selection. Remove the current entry from the descript list of the hole. The new data segment interacted with the vacant hole, which means that the current descriptor is not valid. It needs to be destroyed. If the first data segment is larger than the first vacant hole, we create a new hole descriptor, the first new hole should be equal to the first data segment-1. A new descriptor for this hole is created. if the last data segment is less than the last vacant hole and more segments are still available, a new hole descriptor ‘new hole’s created with the first hole equal to the last data segment+1 and the last hole equal to the last vacant hole. This is a mirror test with an additional feature for the fifth step. At the beginning, we never knew how long the reassembled data unit would last, so we had to create a vacant hole between 0 and infinity. Spring to the first step. When the hole descriptor list is finally emptied, the dataset is finished.

RESULT AND ANALYSIS

In order to evaluate the actual performance, we analysed the performance of our model by measuring the entire process from segmentation to reassembly, depending on the file size (0.05MB to 1900MB). We perform these experiments on an 8-core Intel xenon Linux operations system with 2.50GHz and 8 GB of memory. In table 1 bellow, we showed the average execution time for all physical and cloud server process. Figure 5 show that the execution time would increase with increasing data size. RSA plays an increasing role in time. We also found that cloud runtime is twice the physical side of the server.

Table 1: Execution time summery

File sizes	Transfer time (Ms)	Avg Speed (KB/s)	Computation time (Ms)
0.05	55.93	5212	241.71
0.1	55.43	8688.45	285.29
0.55	72.14	15845.38	315.21
1	209.29	24220.2	405
17	1556.57	36166.82	1451.29
142	5953.93	36158.09	4294.43
339	13802.5	36222.94	10578.07
750	27854.21	35463.08	19976.71
1100	42550.07	33970.44	28745.57
1500	55918.29	33808.14	38249
1900	71411.8	31441.37	50348.9

Fig 5: Execution result

CONCLUSION

In this paper we have introduced a new model called CIST, which address the risks involved in data loss during the cloud migration process. In this model we have combined various technological techniques to achieve a targeted goal. In the model, we used date segmentation and RSA reassembly techniques to migrate data from the physical computing environment to the cloud computing platform. Our future work is to test different cryptography algorithms, to control errors during data transmission and to improve model performance by parallelizing.

REFERENCES

Mell P, Grance T. The NIST definition of cloud computing. National Institute of Standards and Technology, Information Technology Laboratory. Version 15, 10-7-09
Kaur and M. Mahajan. Integration of Heterogeneous Cloud Storages through an Intermediate WCF Service. International Journal of Information Engineering and Electronic Business, pp. 45-51, 2015
Teli, M. Thomas and K. Chandrasekaran. Big Data Migration between Data Centers in Online Cloud Environment. Procedia Technology, vol. 24, pp. 1558-1565, 2016.
Chauhan and P. Bansal. Emphasizing on Various Security Issues in Cloud Forensic Framework. Indian Journal of Science and Technology, vol. 10, no. 18, pp. 1-7, 2017.
Yan H, Li J, Han J, et al. A Novel Efficient Remote Data Possession Checking Protocol in Cloud Storage. IEEE Transactions on Information Forensics & Security, 2017, 12(1):78-88. IEEE.
Yang K, Jia X. An Efficient and Secure Dynamic Auditing Protocol for Data Storage in Cloud Computing. IEEE Transactions on Parallel & Distributed Systems, 2013, 24(9):1717-1726. IEEE.
Deswarte Y, Quisquater J, Saïdane A. Remote integrity checking. Integrity and internal control in information systems VI. Springer US, 2004: 1-11.
Ateniese G, Burns R, Curtmola R, et al. Provable data possession at untrusted stores. ACM Conference on Computer and Communications Security. ACM, 2007:598-609.
Ateniese G, Pietro R D, Mancini L V, et al. Scalable and efficient provable data possession. Proceedings of the 4th international conference on Security and privacy in communication netowrks. ACM, 2008:1-10.
Erway C, Küpçü A, Papamanthou C, et al. Dynamic provable data possession. Proceedings of the 16th ACM conference on Computer and communications security. ACM, 2009: 213-222.
Wang Q, Wang C, Li J, et al. Enabling Public Verifiability and Data Dynamics for Storage Security in Cloud Computing. European Conference on Research in Computer Security. Springer-Verlag, 2009:355-370.
Liu, M. Becker, M. Behnam and T. Nolte. Using segmentation to improve schedulability of real-time traffic over RRA-based NoCs. ACM SIGBED Review, vol. 13, no. 4, pp. 20-24, 2016.
Anmin Fu, Shui Yu, Yuqing Zhang ,Huaqun Wang and Chanying NPP: A New Privacy-Aware Public Auditing Scheme for Cloud Data Sharing with Group Users. IEEE , vol . PP,no. 99,pp.1-1,May 2017.
Wang C, Wang Q, Ren K, et al. Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing. INFOCOM, 2010 Proceedings IEEE. IEEE, 2010:1-9.
Wang C, Chow SSM, Wang Q, et al. Privacy-Preserving Public Auditing for Secure Cloud Storage. IEEE Transactions on Computers, 2013, 62(2):362-375.
Chen L, Zhou S, Huang X, et al. Data dynamics for remote data possession checking in cloud storage. Computers & Electrical Engineering, 2013, 39(7):2413-2424.
Yu Y, Ni J, Man H A, et al. Improved security of a dynamic remote data possession checking protocol for cloud storage. Expert Systems with Applications, 2014, 41(17):7789-7796.
Mazhar Ali, Revathi Dhamotharan , Eraj Khan,Samee U. Khan, Athanasios V . Vasilakos, Keqin Li, Albert Y. Zomaya. SeDaSC: Secure Data Sharing in Clouds. IEEE Systems Journal , vol.11, No. 2, pp.395 –404, Jun 2017
Fu Jing-yi, Huang Qin – long, Ma Zhao -feng, Yang Yi- xian. Secure personal data sharing in cloud computing using attribute -based broadcast Encryption. The Journal of China Universities of Posts and Telecommunications, vol 21, no. 6,pp. 45- 51, 77,Dec 2014
Gasti and Y. Chen. Breaking and Fixing the Self Encryption Scheme for Data Security in Mobile Devices. 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
Zou. Cloud Data Management Based on Cloud Computing Technology. Applied Mechanics and Materials, vol. 543-547, pp. 3573- 3576, 2014.
Radhika Patwari, Sarita Choudhary. Security issues and Cryptographic techniques in Cloud Computing. International Journal of Innovative Research in Computer Science and Engineering, vol.2, no.4, pp.1-6, Sep-Oct 2015.
Muthi Reddy P, Manjula S.H., Venugopal K.R. Secure Data Sharing in Cloud Computing: A Comprehensive Review. International Journal of Computer, vol.25, no.1, 2017.

Antonis Michalas, Noam Weingarten. HealthShare: Using Attribute based Encryption for Secure Data Sharing between Multiple Clouds. IEEE International Symposium Computer Based Medical Systems, Apr 2017.