Storing files in a distributed file system using blockchain technology – update

data: 27 lipca, 2021
czas czytania: 14 min
autor: Łukasz Korba

Following the rule of limited trust, it not only changes a general approach to everyday life but also forces different branches of business to undertake additional actions and introduce procedures to ensure that the particular subject matter is well-defined and not modified without the knowledge of the other party.

It affects, in particular, the cases where a reasonable suspicion or doubt about the delivered data correctness occurs, which is the matter of exchange. Among such examples are downloaded files, sent document agreements, or regulations. 

WHO IS AFFECTED (POLISH SCENE) 

In Poland, the Consumer Rights Act, which came into force on the 25th of December 2014, imposes the information obligation on entrepreneurs in contracts concluded remotely or outside the company’s premises. The Act identifies various ways of passing important information to the consumer (understood as communication). Such information can be written on paper (or any other durable medium) or forwarded in a way corresponding to the type of applied communication means. This law also introduces a new definition, which is a “durable medium”. Simply put, it is any substance or tool allowing a consumer or an entrepreneur to store information dedicated and addressed strictly to them, and which can be accessed in the future as long as it is needed (timespan provided by law) in an unmodified form. In practice, we can consider the following things as durable media: 

  • paper,
  • CD/DVD, 
  • pen drive, 
  • memory card, 
  • any other hard drive. 

But also services, such as: 

  • email, 
  • SMS. 

Some time ago, a solution based on the blockchain technology was introduced, deployed and implemented as a customer service by PKO BP. It turned out that their unique approach is based on the blockchain technology, where nodes are represented by the bank itself and the KIR (National Clearing House). Such software solution has many merits stemming from the applied technology, but we can also assume that it could help to reduce the costs of printing documents on paper or burning CD/DVDs including the costs of a production process, implementation, and maintenance. A similar approach is planned by Alior Bank. These aren’t the only examples embracing the problem of storing files and making sure that they remain unchanged, but they indicate possible challenges which may arise for entrepreneurs, institutions, parties, state offices, or even governments. Below, there is a list of different domains which may require some sort of support in case of the authorized files: 

  • online bookstores selling digitally signed electronic versions of books, 
  • authorship, 
  • notarial acts or ownerships, 
  • marketing consents, 
  • privacy policy. 

PROPOSED SOLUTION 

Before you jump in at the deep end and dive into the possibilities of blockchain, it is good to think things through. Applying blockchain technology guarantees immutability of the saved data, however it also has its flaws which one should carefully consider against business needs. Once we take the blockchain as a durable medium into consideration, we should be aware that the best approach is to store only the basic information, not the files themselves. This information could be understood as names, timestamps, extensions, authors, or any other properties which are stored by a file system and help identify the file among others. Here we come to the point made before that the best way to distinguish the file (not only from other files but also from versions of the same file) is knowing its hash. It’s another level of security which hides the content of the file behind a fixed size sequence of letters and digits. Any modification of the file, regardless of intentions, meaning, or size, causes a complete change of its hash, which gives a proof that there is (or isn’t) a consistency between the requested file and the file stored. This proof is an argument supporting the case. 

Having answered what we should store in the blockchain, another question arises – where should we keep these files in practice? As decentralization gains popularity, it is tempting to use one of the distributed file systems. They are designed to consolidate information and facilitate file sharing while providing remote access at a local-like level. The main characteristics of such systems are high availability, reliability, data integrity, scalability, and heterogeneity (distributed system looks the same for every device which can also be a part of the system). An example of a distributed file system that has been functioning for some time is the InterPlanetary File System known as IPFS. It synthesizes successful ideas from previous peer-to-peer systems, including BitTorrent or Git. It also provides a high throughput content-addressed block storage model, with content address hyperlinks. Its structure, built upon DAG (Directed Acyclic Graph), lets us create a versioned file system, or even a permanent web. IPFS consists of hashing tables, incentivized blocks exchange, and self-certified file system. 

As each of the system components, it also has some demerits or aspects requiring some additional consideration, i.e. need for additional data backup or the way of securing the file replication in the network. The second issue is especially worth considering because IPFS itself doesn’t have any automatic replications. Nodes only store and/or distribute content they are explicitly intended to store and/or distribute. Simply put, devices that run IPFS nodes don’t have to host files that they are not designed for. But when it comes to the first issue, nodes can refer to the requested file and from now on have its local copy. For more details and explanations, please visit IPFS homepage and read their whitepaper

After this introduction, we can go straight to the technical details of the solution which is expected to solve described problems using the technologies mentioned. 

TECHNICAL ASPECTS 

This section describes a simple implementation of storing file hash (SHA-256) in the private blockchain built upon ETH protocol keeping the file in a distributed file system – IPFS. 

Technology stack: 

  • Truffle (development framework for dapps based on the Ethereum blockchain), 
  • Solidity (contract-oriented programming language for writing smart contracts), 
  • Web3.js (Ethereum JavaScript API), 
  • React (JavaScript library for building user interfaces). 

ENVIRONMENT PREPARATION AND INSTALLATION TOOLS 

Disclaimer: the instruction is based on Windows OS, some of the instructions may require OS specific approach such as using sudo in Linux-based OSes. 

START WITH DOCKER (RECOMMENDED) 

  1. Install Docker and ensure that docker command is available in the terminal window: 
    > docker -v 
    Docker version 20.10.6, build 370c289 
  2. Clone the DocuHash repository and navigate to its root directory inside the terminal. 
  3. Run the docker compose up command. Execution may take a few minutes. Once you see the text shown below inside the terminal, you will be able to see the application in the browser under localhost:3000 address. 

> docker compose up 
... 
application_1  | Compiled successfully! 
application_1  | 
application_1  | You can now view docuhash in the browser. 
application_1  | 
application_1  |   Local:            http://localhost:3000 
application_1  |   On Your Network:  http://172.22.0.4:3000 
application_1  | 
application_1  | Note that the development build is not optimized. 
application_1  | To create a production build, use yarn build. 
... 

START WITHOUT DOCKER 

  1. Install the following tools: 
    a) Node.js (latest LTS version is recommended), 
    b) Truffle (version 5.*), 
    c) IPFS
  2. Ensure that the tools are available from the command line:

> node -v 
v14.15.3 
 
> truffle version 
Truffle v5.3.8 (core: 5.3.8) 
Solidity v0.5.16 (solc-js) 
Node v14.15.3 
Web3.js v1.3.6 
 
> ipfs version 
ipfs version 0.8.0

  1. Clone the DocuHash repository and navigate to the eth directory. 
  2. In the terminal run truffle develop
  3. In the interactive Truffle console, run compile, followed by migrate --reset. Keep the terminal window open. 
  4. In a new terminal window, navigate to the directory containing the IPFS executable. 
  5. Run in sequence: 

> ipfs config --json API.HTTPHeaders.Access-Control-Allow-Origin '[\"*\"]' 
> ipfs bootstram rm –all 
> ipfs daemon 

Keep the terminal window open. Notice that all peer nodes are removed for the purpose of this PoC (ipfs bootstram rm --all). In a real world scenario, where a complete configuration is required, it would be better to add additional nodes to the IPFS network (different machines) using the command: ipfs bootstrap add /ip4/<node_ip>/tcp/4001/ipfs/<node_hash> 

  1. In a new terminal window, navigate to the repository’s root directory. 
  2. Run yarn followed by yarn start. Keep the terminal window open. A new browser window/tab should automatically open and the application should shortly load within it. 

DETAILED DESCRIPTION OF THE MAJOR ELEMENTS 

  1. Smart contract HashStorage in Solidity (/eth/contracts/HashStorage.sol): 
    A) Smart contract contains a structure consisting of the following file information: 
    -> string ipfsHash – file’s hash (CID) in the decentralized file system, 
    -> string fileName – name of the file, 
    -> string fileType – file type (e.g. image/jpeg), 
    -> uint dateAdded – information regarding when a file was added in form of a unix timestamp (a number of seconds which have elapsed since the 1st of January 1970; midnight UTC/GMT), 
    -> bool exist – information regarding whether or not a given file hash has been added to the blockchain. 
    B) The smart contract also contains a mapping which can be seen as a hash table. In general, mappings are virtually initialized for every possible key which exists (in our example hash). We can easily find a value for every possible key, even for those which have not been used previously; the keys mapping will return the value with its byte-representation being a type’s default value. 
    C) In this Smart contract we can find two functions: 
    -> get – receives one parameter – file hash. This function allows to check whether a specific file is stored in IPFS and if so – returns its basic metadata (name, type timestamp). 
    -> add – receives five parameters: IPFS hash, file hash, file name, file type and date added. The execution of this function is limited to the owner, which means that no other party can successfully invoke this method. It is ensured by Ownable.sol contract from OpenZeppelin (a library for secure smart contract development). Our owner is set up in the constructor, which is executed during a contract deployment to the network. An owner can be changed over time, but it’s beyond our consideration. Getting back to the function: after some initial validation, the it adds a newly created object to the mapping mentioned. At the end, it emits an event as a clear signal that everything has gone right.

pragma solidity >=0.4.21 <0.7.0; 

import "./Ownable.sol"; 

contract HashStorage is Ownable{ 
    mapping (string => DocInfo) collection; 
    struct DocInfo { 
        string ipfsHash; 
        string fileName; 
        string fileType; 
        uint dateAdded; 
        bool exist;  
    } 

    event HashAdded(string ipfsHash, string fileHash, uint dateAdded); 

    constructor () public { 
        owner = msg.sender; 
    } 

    function add(string memory _ipfsHash, string memory _fileHash, string memory _fileName, string memory _fileType, uint _dateAdded) public onlyOwner { 
        require(collection[_fileHash].exist == false, "[E1] This hash already exists in contract."); 
        DocInfo memory docInfo = DocInfo(_ipfsHash, _fileName, _fileType, _dateAdded, true); 
        collection[_fileHash] = docInfo; 

        emit HashAdded(_ipfsHash, _fileHash, _dateAdded); 
    } 

    function get(string memory _fileHash) public view returns (string memory, string memory, string memory, string memory, uint, bool) { 
        return ( 
            _fileHash,  
            collection[_fileHash].ipfsHash, 
            collection[_fileHash].fileName, 
            collection[_fileHash].fileType, 
            collection[_fileHash].dateAdded, 
            collection[_fileHash].exist 
        ); 
    } 

2. Web App – React 

  1. UploadSection (/src/components/UploadSection/index.tsx) – component responsible for uploading files selected by the user. Indirectly calls both the ETH network and IPFS to store files. 
  2. HashListSection (/src/components/HashListSection/index.tsx) – component responsible for displaying hashes of uploaded files (using browser’s local storage as a data persistence layer). 
  3. SearchSection (/src/components/SearchSection/index.tsx) – component allowing the user to search uploaded files by hash. If a given file is found, the component displays its basic metadata fetched from the blockchain and exposes a functionality to either browse or download the file. 
  4. Web3Provider (/src/components/Web3Provider/index.tsx) – higher order component used to encapsulate the application with a web3 provider context. It allows the components to access the web3 object and subsequently use it to interact with the ETH network. 
  5. config.ts (/src/config.ts) – file defining URLs used for communication between the client application and storage networks (ETH and IPFS). 
  6. utils.ts (/src/utils/index.ts) – file containing definitions of functions used to communicate with ETH and IPFS. These functions primarily allow for storing and downloading files. 
  7. hooks.ts (/src/hooks/index.ts) – file containing React hook definitions used to get HashStorage contract’s object representation and an accounts list from the ETH network.

3. truffle-config.js (/eth/truffle-config.js) – file defining available ETH networks. For the purpose of this PoC, three different networks have been specified:

  1. develop – network used when running the application locally without Docker. 
  2. demo – network used when running the application locally with Docker. 
  3. production – network used when running a deployed application. As opposed to the other networks, this one’s connection parameters (host URL, port) are not hard-coded and instead are passed through the environment variable. 

DESIGN STRUCTURE 

The system consists of three main elements – a client application, a private ETH network and an IPFS node. The client application communicates with ETH and IPFS via HTTP requests. When the application is run locally, there is also an IPFS web UI available under 127.0.0.1:5001/webui, which can be used to configure the node and add/browse the files directly. A graphic representation of the application structure can be seen on Fig. 1. 

USER INTERFACE 

The application consists of a single screen, divided into three main sections, as shown on Fig. 2. 

UPLOAD SECTION 

This section contains a drop zone, which allows users to drag and drop files which they wish to upload. It is also possible to open a file picker dialog window by clicking the area marked by the dotted line. Once a file is dropped or chosen in the dialog window, an upload operation will initialize, which will be indicated by a spinner appearing at the top of the screen. Once the upload is finished, a new file hash will appear in the section below. 
There may be a case where a given file already exists in the system (e.g. its copy may have been uploaded by someone else). In such case an appropriate browser alert will appear as shown on Fig. 3. The file hash will then appear in the stored file hashes section. 

STORED FILE HASHES SECTION 

This section contains a list of uploaded file hashes. If no files are uploaded, the section will display text “No files have been uploaded yet”, as shown on Fig. 2. Right after a new file gets uploaded, the new hash on the list will get highlighted in yellow for a brief moment, which is represented on Fig. 4. 
Right after starting the application, a synchronization step gets performed, checking whether or not the file hashes stored in the browser memory have corresponding entries on the blockchain (differences may occur e.g. after deploying a new version of the system). While this operation is in progress, the section displays text “Syncing files, please wait…” as shown on Fig. 5. 

SEARCH SECTION 

This section allows for searching files identified by a hash. A common use case is to copy a chosen hash listed in the stored file hashes section, paste it into the text input and click the “Search” button. This will initiate calls to both ETH network and IPFS and as a result will return file’s metadata fetched from the blockchain. This metadata will get rendered under the search input. Additionally, two buttons will appear – “Browse the file” allowing for viewing the file in a new browser tab (provided that the file can be viewed – image, pdf etc.) and “Download the file” which allows to download the file to the hard drive. Example view of the section after a file had been searched is shown on Fig. 6. 

SUMMARY 

The proposed solution is not a comprehensive example. In order to deepen the issue, we should consider adding more nodes to a distributed file system, migrating the application to the public network, encrypting the file before sending and, finally, we should try to evaluate trust towards the server and decide who should pay the network fees. There is probably much more than that, but it is some food for thought. This article shows some possibilities and strengths of applying a blockchain technology to the described problem. If you have any questions or want to expand this solution, feel free to contact us or leave a comment. The code is available on GitHub. You can also take a look at a live demo

SOURCES 

  1. https://www.studiamba.wsb.pl/baza-wiedzy/trwaly-nosnik-co-oznacza-dla-przedsiebiorcy-w-transakcjach-z-konsumentem#trwaly-nosnik  
  2. https://fintech.pkobp.pl/blockchain-w-banku  
  3. https://ipfs.io/  

Newsletter IT leaks

Dzielimy się inspiracjami i nowinkami z branży IT. Szanujemy Twój czas - obiecujemy nie spamować i wysyłać wiadomości raz na dwa miesiące.

Subscribe to our newsletter

Administratorem Twoich danych osobowych jest Future Processing S.A. z siedzibą w Gliwicach. Twoje dane będziemy przetwarzać w celu przesyłania cyklicznego newslettera dot. branży IT. W każdej chwili możesz się wypisać lub edytować swoje dane. Więcej informacji znajdziesz w naszej polityce prywatności.

Subscribe to our newsletter

Administratorem Twoich danych osobowych jest Future Processing S.A. z siedzibą w Gliwicach. Twoje dane będziemy przetwarzać w celu przesyłania cyklicznego newslettera dot. branży IT. W każdej chwili możesz się wypisać lub edytować swoje dane. Więcej informacji znajdziesz w naszej polityce prywatności.