Part 2 – Evgeny Medvedev – Medium


Ethereum ETL Architecture

Here I include the sections I cut out from the Google Cloud Blog article for brevity.

Handling Ethereum’s 32-byte Numbers

You can notice that the type of token_transfers.value is STRING. The reason for this is value transfers in Ethereum are 32 bytes and there is no numeric data type in BigQuery (or any other data store) that can handle numbers that big. There are 2 workarounds here:

  • Cast to FLOAT64 type. This type is 8 bytes so there will be loss of precision in many cases, but for most analytic purposes it’s fine.
#standardSQL
-- sum of transfers of CoinFi token to a particular address
select sum(cast(value as float64)) as total_received
from `bigquery-public-data.ethereum_blockchain.token_transfers`
where to_address = '0x689c56aef474df92d44a1b70850f808488f9769c'
and token_address = '0x3136ef851592acf49ca4c825131e364170fa32b3'
  • Cast to NUMERIC type. This type is 16 bytes and there is no loss of precision, but it will fail in case the value overflows. That’s why you need to use SAFE_CAST function:
#standardSQL
-- token holders of EOS with balances
with transfer_statements as (
select token_address, to_address as address,
safe_cast(value as numeric) as value
from
`bigquery-public-data.ethereum_blockchain.token_transfers`
union all
select
token_address, from_address as address,
-cast(value as numeric) as value
from
`bigquery-public-data.ethereum_blockchain.token_transfers`
)
select address, sum(value) as balance
from transfer_statements
where token_address = '0x86fa049857e0209aa7d9e616f7eb3b3b78ecfdb0'
group by
address
having balance > 0

Note that for transactions.value the type is NUMERIC, not STRING.That’s because Ether, unlike ERC20 tokens has a fixed number of decimals (18) and the total supply is limited, so all the transfers fit into 16 bytes. NUMERIC can even handle the total Ether volume:

#standardSQL
-- total Ether volume
select sum(value)
from `bigquery-public-data.ethereum_blockchain.transactions`
> 5929681643499329302468021371

Filter by Partition to Save Costs

Most tables in the dataset are partitioned by day. This allows BigQuery to scan less data when a query has the filter by the partition column. This query, for example, returns all token transfers for the past day and scans only 60MB of data, costing you a fraction of a cent:

#standardSQL
select
token_address, from_address, to_address, value
from
`bigquery-public-data.ethereum_blockchain.token_transfers`
where date(block_timestamp) >= date_add(current_date(), interval -1 day)

JavaScript UDFs

You can disassemble all contracts in the dataset and see which opcodes are most frequent. The below query demonstrates this, as well as the ability to work with arrays and structs in BigQuery:

#standardSQL
create temp function disassemble_bytecode(bytecode string)
returns array<struct<name string, fee int64, pushData string>>
language js as """
return parseCode(bytecode);
"""
options (
library="gs://ethereum-etl-bigquery/disassemble_bytecode.js"
);
with opcodes as (
select disassemble_bytecode(bytecode) as op
from `bigquery-public-data.ethereum_blockchain.contracts`
)
select flattened_opcodes.name, count(*) as count
from
opcodes
cross join unnest(opcodes.op) as flattened_opcodes
group by flattened_opcodes.name
order by count desc
limit
10

Here is the result:

Most frequent opcodes

Run it in BigQuery

Follow us on Twtitter for project updates https://twitter.com/EthereumETL.

Let’s block ads! (Why?)


Source link

Previous Bitcoin Price Analysis: Are Oversold Bounces Leading the Market Higher?
Next Money-laundering hack and Japanese law could threaten a wipeout of Bitcoin

No Comment

Leave a reply

Your email address will not be published. Required fields are marked *