Here I include the sections I cut out from the Google Cloud Blog article for brevity.
Handling Ethereum’s 32-byte Numbers
You can notice that the type of
STRING. The reason for this is value transfers in Ethereum are 32 bytes and there is no numeric data type in BigQuery (or any other data store) that can handle numbers that big. There are 2 workarounds here:
- Cast to
FLOAT64type. This type is 8 bytes so there will be loss of precision in many cases, but for most analytic purposes it’s fine.
-- sum of transfers of CoinFi token to a particular address
select sum(cast(value as float64)) as total_received
where to_address = '0x689c56aef474df92d44a1b70850f808488f9769c'
and token_address = '0x3136ef851592acf49ca4c825131e364170fa32b3'
- Cast to
NUMERICtype. This type is 16 bytes and there is no loss of precision, but it will fail in case the value overflows. That’s why you need to use
-- token holders of EOS with balances
with transfer_statements as (
select token_address, to_address as address,
safe_cast(value as numeric) as value
select token_address, from_address as address,
-cast(value as numeric) as value
select address, sum(value) as balance
where token_address = '0x86fa049857e0209aa7d9e616f7eb3b3b78ecfdb0'
group by address
having balance > 0
Note that for
transactions.value the type is
STRING.That’s because Ether, unlike ERC20 tokens has a fixed number of decimals (18) and the total supply is limited, so all the transfers fit into 16 bytes.
NUMERIC can even handle the total Ether volume:
-- total Ether volume
Filter by Partition to Save Costs
Most tables in the dataset are partitioned by day. This allows BigQuery to scan less data when a query has the filter by the partition column. This query, for example, returns all token transfers for the past day and scans only 60MB of data, costing you a fraction of a cent:
select token_address, from_address, to_address, value
where date(block_timestamp) >= date_add(current_date(), interval -1 day)
You can disassemble all contracts in the dataset and see which opcodes are most frequent. The below query demonstrates this, as well as the ability to work with arrays and structs in BigQuery:
create temp function disassemble_bytecode(bytecode string)
returns array<struct<name string, fee int64, pushData string>>
language js as """
with opcodes as (
select disassemble_bytecode(bytecode) as op
select flattened_opcodes.name, count(*) as count
cross join unnest(opcodes.op) as flattened_opcodes
group by flattened_opcodes.name
order by count desc
Here is the result:
Run it in BigQuery
Follow us on Twtitter for project updates https://twitter.com/EthereumETL.