IPFS, a key point in IoT devices and the new internet.
The use of the internet and the number of connected IoT devices is growing rapidly. However, the current internet infrastructure, the apps used and the machine-machine connection, store and process data centrally using HTTP protocols and under a client-server model with certain drawbacks. The advent of Web 3.0 and decentralized systems has attracted technologies such as IPFS (Interplanetary File System) for more secure data storage and distribution. IPFS is currently used in AirTrace in ADOS and future implementations are planned.
HTTP vs IPFS
The client-server model with HTTP (Hypertext Transfer Protocol) as the basis for communication stores data on centralized servers so that any client wishing to access it can do so via location-based addressing, i.e. via a URL. Having centralized control of the data means that it can be altered, deleted or viewed by anyone who has access to those servers, be it a legal authority or a malicious hacker. In addition, HTTP is not efficient for transfers of large amounts of data, has high bandwidth costs, there is duplication of files and the instability that if a server goes down, the data on the server cannot be accessed.
IPFS (Interplanetary File System) is a peer-to-peer (P2P) network that was born thanks to Protocol Labs, and allows the storage and distribution of files in a decentralised way all over the world. The system is based on key-value-pair data storage and, as we will see below, contains three key concepts. By streaming video on IPFS, it is estimated that 60% of the bandwidth cost would be saved. IPFS provides historical versions of stored files via cryptographic hashes, removes duplicate files from the network and keeps track of version history, thus having the content available at all times.
IoT systems with MQTT
MQTT is one of the most widely used protocols for Machine-machine data transmission. This system works under the distribution of messages through a central client/server system (called a broker or router) and the Pub/Sub methodology. This methodology consists in that a 'Subscriber', informs the router or broker that he/she wants to receive a type of message and another agent (publisher), can publish these messages to the router/broker, so that it can then distribute them to the subscribers.
The architecture has a central broker where all data is received. Once the broker collects the data, it sends the data to the number of waiting subscribers via a 'message service' infrastructure, where messages are filtered according to some criteria and immediately distributed to the connected subscribers. MQTT is a great solution, however scaling a large number of billions of IoT devices would require a huge infrastructure and cloud computing that would incur significant costs.
IPFS for IoT
IPFS has pub/sub events similar to MQTT. The main difference with MQTT is that a centralized broker (server) is not required as IPFS provides the equivalent of such a broker, but in a decentralized and distributed manner. Thus, each subscriber interested in an event will also act collectively for the syndication of events from the broker to other interested subscribers and everything will be cryptographically secure. With IPFS, a number of IoT devices can be networked together and made to act as a shared file system with a series of events on a distributed platform.
IPFS may be a bit different from a blockchain, but it maintains the decentralization in which cryptographic hashes of files are stored on multiple nodes in the network. So when a node decides to store a hash, it becomes a host node for this data. However, the nodes store the content they are interested in and also the index of who is storing what. If any node storing data from a certain file were to be disconnected, this data would still be available through other nodes as if nothing had happened.
Key concepts IPFS
To understand IPFS in more detail, there are three fundamental concepts that make the system work.
This concept is quite simple, and consists of the way in which data is accessed. That is, instead of referring to the data (photos, videos, articles) by location (e.g. by URL), or by the server on which they are stored, IPFS accesses them by the representation of the content that is encrypted under a content identifier (CID), which is its hash under the sha-256 algorithm, and which cannot be duplicated. Therefore, IPFS would simply have to launch a request asking all the nodes in the network who has that particular Hash and the node that contains it will return the corresponding data.
Directed Acyclic Graphs(DAG)
DAG is a type of graph where related data can be represented by a set of nodes (vertices) representing a set of data and edges, which are the lines (edge) representing the flow of data from one point to another within the graph. IPFS uses DAG's Merkle, where each node has a unique identifier that is a hash of the node's contents. Let's say you have a file and its CID identifies it, but that file is in a folder with other files, and those files also have CIDs. So the CID of that folder would be a hash of the CIDs of the files below it. In turn, those files are made up of blocks, and each of those blocks has a CID.
Distributed Hash Tables(DHTs)
This part is the necessary piece to know how to find and move the content. That is, to know which nodes are storing the content you are looking for, IPFS uses a distributed hash table (DHT). The concept of a hash table is not new, since it is a database with key-value pairs. However, a distributed hash table is a type of hash table where a value associated with a key can be queried in a network where the data is stored in a distributed manner through a series of nodes, providing an efficient service to find the node containing the information through a routing system.
IPFS on AirTrace
At AirTrace we are aware of the importance of using IPFS for IoT devices and that is why although it is currently used only for the ADOS system (AirTrace Decentralized Oracle System), we have a vision for the future in which there will be a way of storing unstructured data, such as images, video, etc... through the use of IPFS. A use case could be data from cameras and subsequent storage in a distributed manner. Currently AirTrace uses S3 services for ADOS where it stores both the data and the hash of the data. However, the model data for anomaly detection (the weights and metadata) are hashed so that each model has a unique hash which is then entered into IPFS to differentiate between the different models, in order to know which one is being referred to each time ADOS is run.
Although it is still early days, IPFS is expected to be the replacement for HTTP, thus enabling the decentralization of the Internet, welcoming the massive use of Web 3.0 and achieving the distribution of information management power, which is currently centralized in a few actors. Thanks to systems such as IPFS, nobody or nothing could block access to information, the security of data sharing would be increased, DDoS attacks would be almost obsolete as they occur mostly in centralized systems and connection speed could be improved. IPFS and Blockchain are a combination that will be talked about for years to come.