How to Build Accounts Service for Compound Finance

Why?

How to Build a Compound Liquidation Bot details economic and technical considerations. The most important factor is the latency of detecting unhealthy accounts. The Account Service provided by Compound Finance has considerable block delay and is insufficient for a liquidation bot.

What?

Project Carbon implements an Accounts Service to reduce latency of account data by querying the blockchain directly via the Compound smart contracts and storing it off chain.

How to Fail at Building an Accounts Bot

Use IndexedDB to store account data.

Use Comlink and Web Worker to run bot in the background. Need to override create-react-rewired app’s webpack configuration to install worker-plugin.

Web3 is not supported in web worker. This is too much of a pain in the ass. Use Go and Geth instead.

How to Build an Accounts Bot using Go, Geth, and Infura

Learn Go here and here.

While the Geth client is downloading the blockchain the ethclient calls will return bad data. Use https://mainnet.infura.io until the blockchain is synced.

Generate Go contract bindings.

Use FilterQuery to query for past events. After generating the Go contract the FilterBorrow function is also available.

https://github.com/Azure-Samples/azure-cosmos-db-mongodb-golang-getting-started

https://compound.finance/developers/comptroller#account-liquidity

The Comptroller is implemented as an upgradeable proxy. The Unitroller proxies all logic to the Comptroller implementation, but storage values are set on the Unitroller. To call Comptroller functions, use the Comptroller ABI on the Unitroller address.
https://compound.finance/developers/comptroller#architecture

Comptroller implementation contract. Copy the ABI from there.

Execute the ABI against the proxy contract.

account liquidity is defined as the total estimated ether value of an account’s collateral
https://compound.finance/developers/comptroller#account-liquidity

CSAI FilterBorrow calls using Infura keeps failing. Using FilterOptions to limit the block range helps to reduce failures. Use gopkg.in/cenkalti/backoff.v2 for exponential retry.

There are 3219 distinct accounts with Borrow transactions across all the Compound markets as of 2020-01-28.

Use go routines to run Accounts Bot in the background. See: https://blog.golang.org/pipelines for cancellation patterns. Use: https://blog.golang.org/context for more robust cancellation support.

Store data in Cosmos DB. Learn about data modeling.

The worst case is each account having a thread that monitors the account’s health. This is done by calling Comptroller.getAccountLiquidity. Using CUSDC as an example, there are ~1K accounts. For all contracts there are more than 3K accounts. As the market grows the number of accounts will grow. Having an order or more magnitude of accounts would imply 30K threads! Imagine each thread calling getAccountLiquidity. This is expensive in CPU, memory, and network resources; eventually impacting margins.

maximum concurrent requests allowed (defined by maxConcurrentRequestsPerCpu) are: 7,500 per small VM, 15,000 per medium VM (7,500 x 2 cores), and 75,000 per large VM (18,750 x 4 cores).
maximum IP connections are per instance and depend on the instance size: 1,920 per B1/S1/P1V2 instance, 3,968 per B2/S2/P2V2 instance, 8,064 per B3/S3/P3V2 instance.
https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits

the Azure networking stack supports 250K total network flows with good performance for VMs with greater than 8 CPU cores and 100k total flows with good performance for VMs with fewer than 8 CPU cores. Past this limit network performance degrades gracefully for additional flows up to a hard limit of 500K total flows, 250K inbound and 250K outbound, after which additional flows are dropped.
https://docs.microsoft.com/en-us/azure/virtual-network/virtual-machine-network-throughput

Writing thousands of accounts to Cosmos DB will get throttled. Retry using exponential backoff.

Bucket accounts by liquidity. Accounts with low liquidity should be queried at a higher frequency because small changes in market volatility can make these accounts unhealthy.

Unit tests should be added to manage complexity as the project grows. Use Dependency Injection to enable testing.

Using local Geth client to retrieve all filter borrow events is really slow. Use Infura instead. Retrieving filter borrow events from a recent block checkpoint is quicker and the local Geth client can be used.

What is the benchmark latency between using local Geth client and Infura?

Minutes using local Geth versus seconds using Infura off chain data.

Why is local Geth client slower than Infura at querying for all filter borrow events?

The main reason is that the Ethereum clients have always been built for single-user use, and the way logs and events are accessed inside geth speaks to that. I won’t get into the specifics of geth’s use of bloom filters, but the key point is that even though eth_getLogs can filter across many dimensions, at the lowest levels inside geth, logs are really only accessible by block number. If a user queries “give me all of the events for my contract from the beginning of Ethereum until now” a couple of things would happen:
https://blog.infura.io/faster-logs-and-events-e43e2fa13773/

The geth node would compare the bloom filter of every block with the log filter.
For every block that is a potential match (and bloom filters often include false positives), the transaction receipts for that block would be loaded.
Finally, the logs generated by said receipts would be compared against the filter one by one.

Even on an otherwise unloaded Ethereum node, a big query like this can take anywhere from hundreds of milliseconds to a couple of seconds to complete.
https://blog.infura.io/faster-logs-and-events-e43e2fa13773/

Luckily, geth has an in-memory caching system, so adding a cache for transaction receipts helps to alleviate some of that pressure, but too many queries for different blocks still lead to cache contention.
To avoid cache contention, our next step was to segment traffic into two groups. Since most log requests are for the most recent blocks, and all those blocks share the same cache, we segmented traffic into two “buckets”:
https://blog.infura.io/faster-logs-and-events-e43e2fa13773/

If your eth_getLogs request covered a small number of recent blocks, it went to the “near head” segment.
Otherwise, your request went to the general eth_getLogs pool.

By grouping logs requests near head to the same set of Ethereum nodes, cache contention was greatly reduced. This helps a lot with overall response times (average response times dropped from over a second to under 100 milliseconds), but still does not address the “long tail” of requests languishing in the general pool. Something else had to be done.
https://blog.infura.io/faster-logs-and-events-e43e2fa13773/

Logs and Events Caching
Today we’re happy to announce the general availability of that “something else”: real-time logs and events caching. Now, when you send an eth_getLogs request to Infura, the RPC is actually handled by an off-chain index of Ethereum logs and events rather than directly by an Ethereum node. This index is backed by a traditional database, which allows us to index and query on more data, without the added overhead of false positives experienced with a bloom filter.
Because these databases are tuned for real-world eth_getLogs queries, we’ve been able to reduce the infrastructure footprint for servicing eth_getLogs by over 90%, and we can continue to offer access to this RPC to all users.
Furthermore, this new architecture addresses a major issue w/ load-balanced responses: inconsistency between different Ethereum nodes. We are excited to finally resolve that issue for eth_getLogs. This new caching layer will provide a consistent view into log data and reacts as necessary to chain re-org events in realtime.
https://blog.infura.io/faster-logs-and-events-e43e2fa13773/

Syncing the local Geth client took ~2 weeks on Azure VM with 1TB premium SSD. It required ~300GB storage @ 2020-02-08. The I/O speed of the SSD is the limiting factor in syncing.