Intro to Postgres Query Plans

Agenda

  • What is a query plan?
  • How do you read an execution plan?

What is a query plan?

SQL is a declarative language. This means:

  • The textual SQL langague we write is transformed into a different language for evaluation
  • A given SQL statement has many equivalent executable ASTs

What is a query plan?

An executable query plan consists of Nodes, which represent a specific type of processing to do. Processing may consist of:

  • Relation scans (sequential, index, bitmap)
  • Joins (loop, merge, hash)
  • Sort
  • Aggregation
  • Unions

Anatomy of a Node

Nodes consist of an input, output, and qualifiers. They contain everything the executor needs to evaluate them.

  • Input may be either a relation to scan or another node(s) (depending on node type)
  • Ouptut is the result to return. Think of a select list
  • Qualifiers are the where clauses

Node Metadata

In addition to execution information, nodes also contain cost estimates for the following:

  • Row width. An average used to estimate memory used
  • Total cost. This is what the optimizer minimizes
  • Constant cost (paid at startup due to pre-processing). Important for limit nodes
  • Output rowcount. Used to estimate cost for aggregate operations

Choosing the execution plan

Initially, nodes are not fully populated with the information required to execute them. At this stage they're called Path nodes.

The optimizer generates all possible Paths, then conducts either an exhaustive or semi-random search (using a genetic algorithm) through the plans for the lowest cost.

The lowest-cost result is translated into a Plan node.

Reading an Execution Plan

Trellis, everyone's favorite.

Reading an Execution Plan


select *
from events
	join participants
		on events.id = participants.event_id
		and participants.cvid like 'auth0%'
where type in ('Paired', 'Unpaired')
order by events.service_created_at asc
limit 500;
                    
Explain Analyze

Reading an Execution Plan

Limit  (cost=32050.37..32051.62 rows=500 width=544)
       (actual time=234.955..235.330 rows=500 loops=1)
  ->  Sort  (cost=32050.37..32060.06 rows=3876 width=544)
            (actual time=234.953..235.115 rows=500 loops=1)
        Sort Key: events.service_created_at
        Sort Method: top-N heapsort  Memory: 532kB
        ->  Hash Join  (cost=12751.95..31857.23 rows=3876 width=544)
                       (actual time=11.284..231.595 rows=4488 loops=1)
              Hash Cond: (participants.event_id = events.id)
              ->  Seq Scan on participants  (cost=0.00..18058.12 rows=268904 width=43)
                                            (actual time=0.033..143.878 rows=274711 loops=1)
                    Filter: (cvid ~~ 'auth0%'::text)
                    Rows Removed by Filter: 560825
              ->  Hash  (cost=12673.38..12673.38 rows=6286 width=493)
                        (actual time=11.087..11.087 rows=6808 loops=1)
                    Buckets: 8192  Batches: 1  Memory Usage: 3646kB
                    ->  Index Scan using event_type on events
                        (cost=0.42..12673.38 rows=6286 width=493)
                        (actual time=0.025..5.912 rows=6808 loops=1)
                          Index Cond: (type = ANY ('{Paired,Unpaired}'::text[]))
Planning time: 0.294 ms
Execution time: 235.525 ms
                    

Reading an Execution Plan

                        ...
              ->  Hash  (cost=12673.38..12673.38 rows=6286 width=493)
                        (actual time=11.087..11.087 rows=6808 loops=1)
                    Buckets: 8192  Batches: 1  Memory Usage: 3646kB
                    ->  Index Scan using event_type on events
                        (cost=0.42..12673.38 rows=6286 width=493)
                        (actual time=0.025..5.912 rows=6808 loops=1)
                          Index Cond: (type = ANY ('{Paired,Unpaired}'::text[]))
                        ...
                        

cost=cost to first element..cost for all elements

can be tuned using postgres confiuration parameters. I.e. Is it a magnetic disk or ssd (not a real param)?

<