CQ9节拍

可扩展的以太坊读取 

by

发表

2022年3月1日

在这篇文章中, 实习生王浩介绍了他的主要项目, which yielded order-of-magnitude speed improvements in Et在这里um data ingestion on our DeFi research 平台.

介绍

在今年1月开始在CQ9做DeFi实习生之前, my experience trading NFTs and using various DeFi protocols taught me that coding on Et在这里um is slow. I never worried too much,  as 10 reads per second would certainly satisfy my personal needs.

然而, if we want to perform research based on on-chain data and activities, 这是不行的. In this blog post, we propose a simple way to make the reading speed more easily scalable.

人们如何解读以太坊

Currently, if a developer wants to read on-chain data, they would likely use a package such as web3.Py或醚.. js向端点发出HTTP请求. The endpoint could be a local hosted node or remote services provided by Infura or Alchemy. Infura and Alchemy will process the HTTP requests and return the data we want.

CQ9运行自己的节点(Go-Et在这里um或Erigon), so CQ9ers could send a request to our own node to read on-chain data.
从web3导入web3
w3 = Web3(Web3.HTTPProvider (ENDPOINT_URL))
w3.乙.get_block(“最新”)

一方面,一方面, this means we don’t need to worry about the parsing of data stored in the database and web3.py将为我们创建相应的JSON RPC调用.

那么,问题是什么呢?

从以太坊读取很慢

合约= w3.乙.合同(USDC_ADDRESS abi = ERC20_ABI)
对于I在(0,1000)范围内:
    合同.功能.balanceOf (USER_ADDRESS).叫()

进行1000次查询大约需要一分钟. T在这里 are many different 合同s and on-chain activities we need to pay attention to. 我刚开始实习的时候, I often heard other interns complaining that it would take an entire weekend to collect the data they wanted. A natural way of increasing read speed is to simply increase the number of clients making queries. If we were to have 500 clients splitting the query requests and performing the job independently, then would we successfully increase the speed of reads by 500 times? 不幸的是不. Simply adding more clients would not solve this problem completely, because the clients still need to make requests to the server which handles the HTTP requests from the clients and fetches the data from the database, 这样服务器就会过载.

那么为什么不增加服务器的数量呢? 如果我们有更多的服务器和客户端, 然后每次客户端发出请求时, 没有单个服务器变得拥塞.

Then we would need a load-balancing algorithm and we basically store the same database for multiple copies. 一定有更优雅的方法来解决这个问题.

只读Et在这里um

如果我们能以只读模式查询以太坊数据, 每个客户端都从同一个数据库中读取数据, 这样我们就不需要运行很多节点了. If we freeze the state of the database, then we don’t even need one node.

如果我们需要增加每秒读取的次数, 我们可以简单地添加客户机, which is more scalable and maintainable than spinning up more server nodes.

第一次尝试

我从厄里贡开始, 因为它在单独的进程中运行RPC守护进程和后端, 它和G乙共享了相当多的代码. 我的第一次尝试是将RPC守护进程移动到客户机中, so I could reduce the server’s work and get my自我 familiarized with the Erigon code base.

在深入研究了它的代码库并与我的导师交谈之后, I realized that I needed to simulate the part of the code that sends the message to the remote server. When navigating the code and figuring out the connections among different 功能 in different files, 调试.PrintStack()变得非常有用, so I could know the top-level function that handles the API requests. 最后我找到了handleMsg().

函数(h *handler) handleMsg(msg *jsonrpcMessage) {
h.StartCallProc(函数c(cp *CallProc) {
 流:= jsoniter.NewStream (jsoniter.ConfigDefault, nil, 4096)
 答案:= h.HandleCallMsg(cp, msg, stream)
})
}

If we could initiate the RPC daemon on the client-side and directly make an HTTP request through the handleMsg function to the backend storing the database, then we have successfully moved the RPC daemon from the server end to the client end.

Python集成

Web3.py will create the JSON message along with many other useful functionalities, so we would need to integrate our client-side RPC daemon with Web3.py.

因为Erigon和G乙都是用Go实现的, 如果我们想在Python脚本中调用Go函数, 我们需要将Go代码编译为C共享库.

go build -buildmode c-shared -o .so 

At the Python end, t在这里 is a package called CFFI (C Foreign Function Interface for Python). Through this package, we could load a C shared library and call the 功能 implemented in Go.

为了与Web3集成.Py中,我们还需要制作定制的提供程序. A provider defines the protocols and the RPC endpoint that the web3 client interacts with.

类HTTPProvider (JSONBaseProvider):
def make_request(自我, m乙od: RPCEndpoint, params: Any) -> RPCResponse:
    Request_data = 自我.encode_rpc_request(方法、参数)
    Raw_response = make_post_request(
        自我.endpoint_uri,
        request_data,
        * *自我.get_request_kwargs ()
    )
    回应=自我.decode_rpc_response (raw_response)
    返回响应

这是HTTPProvider的代码. When we try to get the latest block, the provider will help us create a JSON message like

{“jsonrpc”:“2.0","m乙od":"乙_getBlockByNumber","params":["latest", false],"id":1}

它还将向相应的端点发出post请求.

T在这里fore, we could override the make_request function and create our customized provider.

def make_request(自我, m乙od: RPCEndpoint, params: Any) -> RPCResponse:
    Request_data = 自我.encode_rpc_request(方法、参数)
    Raw_回应=自我.call_go_package (request_data)
    回应=自我.decode_rpc_response (raw_response)
    返回响应

切换到G乙

在将RPC守护进程与geronimo存储后端隔离之后, 我试图使后端只读以及. 让geronimo从远程文件系统读取数据是至关重要的, 比如NFS, otherwise we would still need to store a copy of the entire database on each client device.

然而,我遇到了一个问题.

EROR [01-27 | 13:20:01.359] Erigon startup                           err="mdbx_env_open: block device required, 标签:链数据

I figured out the error code for this error message is ENOTBLK based on this GNU手册. 在Erigon代码库中搜索错误代码后, 我找到了MBDX, 这是Erigon所依赖的数据库, 不支持远程文件系统, 甚至认为远程文件系统是错误的.

mdbx_remote = enotblk

因为MDBX不能很好地与远程文件系统一起工作, 我决定调查一下盖斯, 它利用LevelDB, 并且应该允许远程文件系统读取.

通过将数据库目录设置为只读, 来自G乙的任何写入尝试都会导致权限错误, which I could use to find any code that attempts to modify the files. 

有几种类型的写尝试:

  1. 直接写入数据库(来自P2P的新块)
  2. 创建临时文件(密钥文件、数据库日志等).)
  3. 文件锁定

通过禁用试图写入的模块, I was able to modify G乙 so that it could read from remote file systems, 比如NFS. 甚至更好的, 因为我取消了文件锁定的使用, I could now use multiple instances of G乙 to read from the same directory.

结论

Read-only G乙 allows us to elastically scale read-only Et在这里um workloads. Getting rid of the server side component enables parallelization without the maintenance burden of running live nodes.

因为其他区块链都是go -以太坊的直接分支, 该解决方案可移植到其他区块链, 比如平衡计分卡.

我们希望将这些代码合并到go-et在这里um代码库中, 因此,每个人都可以从这些新功能中受益. 您可以查看我们的拉取请求 在这里.

不要错过任何一个节拍

遵循 us 在这里 for the latest in engineering, mathematics, and automation at CQ9.