Ethereum Source Code Analysis: Blocks, Transactions, Contracts, and the Virtual Machine

Ethereum stands as one of the most influential blockchain platforms, not only enabling decentralized digital currency but also serving as a foundation for smart contracts and decentralized applications (dApps). For developers and enthusiasts seeking deeper technical understanding, diving into the Ethereum source code—specifically the Go implementation (go-ethereum)—offers invaluable insights into its architecture and protocol design.

This article explores core components of Ethereum’s system, including blocks, transactions, contracts, and the Ethereum Virtual Machine (EVM), focusing on how these elements interact to form a secure, scalable, and programmable blockchain environment. All references are based on the open-source go-ethereum project hosted on GitHub.

Core Concepts in Ethereum

SHA-3 Hashing and RLP Encoding

At the heart of Ethereum’s data integrity and identification system lies cryptography and data serialization. Two fundamental technologies power these functions: SHA-3 hashing and Recursive Length Prefix (RLP) encoding.

SHA-3: Secure Hash Algorithm

Ethereum uses SHA-3 (Keccak-256) as its primary cryptographic hash function. Unlike SHA-1 or SHA-2, SHA-3 employs a different internal structure called the sponge construction, offering robust resistance against known cryptographic attacks. Every block header, transaction, and state is identified by a unique 32-byte (256-bit) hash derived using SHA-3.

The irreversible nature of hashing ensures that no original data can be reconstructed from its hash—making it ideal for verifying data integrity without exposing sensitive content.

RLP: Recursive Length Prefix Encoding

While SHA-3 provides data fingerprinting, RLP encoding enables efficient storage and transmission of complex, nested data structures. RLP serializes arbitrary byte arrays—including nested lists—into flat binary sequences. This makes it perfect for encoding Ethereum objects like transactions, blocks, and account states before they are hashed or stored.

👉 Discover how blockchain developers use secure hashing and encoding techniques in real-world applications.

Importantly, RLP is reversible, allowing encoded data to be accurately decoded back into its original structure. In practice, Ethereum computes an object's hash by applying SHA-3 to its RLP-encoded form—commonly referred to as RLP hash.

This combination ensures that:

Data remains compact and consistent.
Objects can be uniquely identified via their hash.
Storage systems efficiently retrieve values using [key, value] pairs where the key is the RLP hash.

Key Data Types: Hashes and Addresses

Ethereum defines several custom types to standardize critical identifiers across the network:

// common/types.go
const (
    HashLength  = 32      // 256 bits
    AddressLength = 20    // 160 bits
)
type Hash    [HashLength]byte
type Address [AddressLength]byte

common.Hash: A 32-byte identifier used for blocks, transactions, and state roots.
common.Address: A 20-byte identifier representing user accounts or contract addresses.

Additionally, Ethereum leverages Go’s big.Int type to handle large integers such as balances, gas limits, and prices—ensuring precision even with extremely high values.

Understanding Gas and Ether

Two essential economic units govern operations within Ethereum:

Gas: The Unit of Computation

Every action on Ethereum—transferring funds, deploying a contract, or executing logic—consumes computational resources. To prevent abuse and ensure fair usage, Ethereum introduces Gas, a unit measuring resource consumption.

Each operation has a predefined gas cost:

Simple arithmetic: low gas
Storage writes: high gas
Memory expansion: variable gas

Transactions specify:

GasLimit: Maximum gas the sender is willing to consume.
GasPrice: Amount of Ether paid per unit of gas.

This mechanism protects the network from infinite loops and denial-of-service attacks by capping execution costs.

Ether: The Native Cryptocurrency

Ether (ETH) is Ethereum’s native currency. It serves as the medium through which users pay for gas. When a transaction executes, the total fee is calculated as:

Transaction Fee = GasUsed × GasPrice

This fee compensates miners (or validators in proof-of-stake) for securing the network.

👉 Learn how gas optimization impacts transaction efficiency and cost savings.

Thus, Ether becomes a functional currency, enabling economic incentives while maintaining system stability.

Blocks: The Foundation of Blockchain

A blockchain is essentially a linked list of blocks. Each Block contains:

// core/types/block.go
type Block struct {
    header       *Header
    transactions Transactions // []*Transaction
    ...
}

The Header stores metadata such as:

ParentHash: Link to the previous block (forming the chain).
Number: Position in the chain (starting from 0 at genesis).
Other consensus-related fields.

Blocks are immutable once added. The first block—the Genesis Block—is hardcoded and has no parent.

Each block contains a list of transactions (transactions []*Transaction). These represent actions initiated by users or smart contracts.

Transaction Structure

A transaction encapsulates a user-initiated action:

// core/types/transaction.go
type txdata struct {
    AccountNonce uint64
    Price        *big.Int
    GasLimit     *big.Int
    Recipient    *common.Address
    Amount       *big.Int
    Payload      []byte
    V, R, S      *big.Int // Signature values
}

Key fields include:

Recipient: Destination address (nil for contract creation).
Amount: Value transferred in Wei.
Payload: Either initialization code for new contracts or input data for function calls.
V, R, S: Components of the digital signature.

Notably, the sender address is not explicitly stored—it's derived from the signature during validation.

How Transactions Are Executed

Transaction execution occurs in two layers: outside and inside the EVM.

Layer 1: Outside the EVM

Execution begins in StateProcessor.Process(), which iterates over all transactions in a block:

func (p *StateProcessor) Process(block *Block, statedb *StateDB, cfg vm.Config) {
    for _, tx := range block.Transactions() {
        receipt, _, err := ApplyTransaction(...)
        receipts = append(receipts, receipt)
    }
}

For each transaction:

A Message object is created via tx.AsMessage(), which recovers the sender using ECDSA.
The ApplyTransaction() function processes the message.
A Receipt is generated post-execution.

What Is a Receipt?

A receipt records:

PostState: Root hash of the state after execution.
Logs: Events emitted during execution (e.g., token transfers).
Bloom: A probabilistic filter for quickly checking log presence.

These receipts help light clients verify events without downloading full state data.

Gas Mechanics During Execution

The transaction lifecycle involves precise gas accounting:

Buy Gas: Deduct GasLimit × GasPrice from sender.
Intrinsic Gas Cost: Calculate base cost based on payload size and recipient.
Execute in EVM: Run logic; consume gas dynamically.
Refund Unused Gas: Return remaining gas to sender.
Apply Refunds: Add bonus gas (for storage cleanup) to refund pool.
Reward Miner: Pay miner (GasUsed + Refund) × GasPrice.

This system incentivizes efficient code while rewarding validators fairly.

Digital Signatures: Securing Transactions

Every transaction must be cryptographically signed using ECDSA (Elliptic Curve Digital Signature Algorithm) over the secp256k1 curve.

The signature ([R, S, V]) allows recovery of the sender’s public key—and thus their address—without revealing private keys.

Go-ethereum uses a Signer interface:

type Signer interface {
    Sender(tx *Transaction) (Address, error)
    Hash(tx *Transaction) Hash
}

This abstraction supports future signature schemes (e.g., EIP-155 replay protection).

Inside the Ethereum Virtual Machine (EVM)

The EVM executes transaction logic in isolation. It operates on:

Stack: For arithmetic and control flow.
Memory: Volatile runtime data.
Storage: Persistent contract data (backed by Merkle Patricia Trie).

Context and State Management

The EVM receives context from:

Message: Contains sender, recipient, value, etc.
StateDB: Interface to account states.
Context: Holds block info and transfer functions.

Transfers are implemented as:

func Transfer(db StateDB, sender, recipient Address, amount *big.Int) {
    db.SubBalance(sender, amount)
    db.AddBalance(recipient, amount)
}

Changes are not immediately committed—they’re cached in StateDB until the entire block is finalized.

Contract Creation and Execution

Contracts are executable objects within the EVM:

type Contract struct {
    CallerAddress Address
    self          ContractRef
    Code          []byte
    Input         []byte
    Gas           uint64
}

Two main execution paths:

Call(): Invokes existing contract code (tx.Recipient != nil).
Create(): Deploys new contract using tx.Payload as initialization code.

When a contract is created:

A new address is generated (based on sender + nonce).
Initialization code runs and returns runtime bytecode.
Runtime code is saved under the new address in StateDB.

Precompiled Contracts

Certain cryptographic operations are implemented natively for performance:

type PrecompiledContract interface {
    RequiredGas(input []byte) uint64
    Run(input []byte) ([]byte, error)
}

Examples include:

SHA256 hashing
RIPEMD160
Elliptic curve operations (e.g., ecrecover)

These bypass interpreter overhead, reducing gas costs significantly.

Interpreter: Running Smart Contract Code

Non-precompiled contracts are executed by the Interpreter, which processes opcodes sequentially:

func (in *Interpreter) Run(contract *Contract, input []byte) {
    pc := uint64(0)
    for pc < len(contract.Code) {
        op := contract.Code[pc]
        operation := jumpTable[op]
        operation.execute(&pc, in, contract)
    }
}

Over 140 opcodes support:

Arithmetic (ADD, MUL)
Logic (AND, OR)
Blockchain interaction (BALANCE, CALL, LOG1–LOG4)

Special opcodes like LOG1 emit events stored in transaction receipts—enabling dApps to monitor activity off-chain.

Summary: Ethereum’s Architectural Strengths

Ethereum’s design combines security, flexibility, and economic incentives:

Concept	Role
Gas	Prevents spam; measures computational effort
Blocks	Immutable containers of ordered transactions
Transactions	Signed messages triggering state changes
EVM	Sandboxed runtime for deterministic execution
Contracts	Self-executing logic with persistent storage

By integrating cryptographic primitives (SHA-3, ECDSA), efficient encoding (RLP), and a robust virtual machine model, Ethereum enables a trustless platform for innovation.

Frequently Asked Questions (FAQ)

Q: Why does Ethereum use SHA-3 instead of SHA-2?
A: Although SHA-2 remains secure, SHA-3 offers structural diversity through sponge construction, providing long-term resilience against potential cryptanalytic advances.

Q: How is the sender address recovered if it’s not stored in the transaction?
A: Using ECDSA’s mathematical properties, the public key—and thus the address—is derived from the signature (R, S, V) during validation.

Q: What happens if a transaction runs out of gas?
A: Execution halts immediately. State changes are reverted, but the gas fee is still charged to prevent network abuse.

Q: Can smart contracts modify their own code after deployment?
A: No. Contract bytecode is immutable. However, upgrade patterns (like proxy contracts) allow logic replacement via delegation.

Q: Why use RLP instead of JSON or Protocol Buffers?
A: RLP is simpler, deterministic, and supports nested structures without schemas—ideal for consensus-critical systems where consistency is paramount.

Q: How do logs improve blockchain usability?
A: Logs enable efficient indexing of events (e.g., token transfers). Applications can listen for specific topics without parsing every transaction.

👉 Explore how developers leverage EVM capabilities to build next-generation dApps.