🇧🇷🍹🤗💜

👋🏻 Hi, I'm Rosa!

👩🏻‍💻 github.com/rosa

🌐 rosa.codes

Principal programmer at 37signals

The curious incident of the incomplete executor 🕵🏻‍♀️

in the night time*

* in some time zones
** yes, in mine 🌙🥱☕

Through this story:

We'll be using Puma as our web server

and nginx on top of it

The code shown is from Rails 7.0.x and Rack 2.2.x

(but the relevant parts are the same)

It all started with a 404 error in our HEY email app...

A 304 (correct) response looked like this:


        Started GET "/boxes/210.json" for ...
        Processing by BoxesController#show as JSON
          Parameters: {"id"=>"210"}
            Oauth::Client Load (0.7ms)  SELECT `oauth_clients`.* FROM `oauth_clients` ..
            Session Load (0.7ms)  SELECT `sessions`.* FROM `sessions`
              WHERE `sessions`.`token` = '...' AND `sessions`.`identity_id` = 4 LIMIT 1
          Authenticated Identity#4 from OAuth access token
            Identity Load (0.6ms)  SELECT `identities`.* FROM `identities` WHERE `identities`.`id` = 4 LIMIT 1
            Box Load (0.7ms)  SELECT `boxes`.* FROM `boxes` ...
            Posting Load (0.7ms)  SELECT `postings`.* FROM `postings` ...
        Completed 304 Not Modified in 13ms

But the 404 response had no authentication and a strange identity_id


        Started GET "/boxes/210.json" for ...
        Processing by BoxesController#show as JSON
          Parameters: {"id"=>"210"}
          CACHE Box Load (0.0ms)  SELECT `boxes`.* FROM `boxes`
            WHERE `boxes`.`identity_id` = 82
            AND `boxes`.`id` = 210 ORDER BY `boxes`.`id` ASC LIMIT 1
          CACHE Contact Load (0.0ms)  SELECT `contacts`.* FROM `contacts` INNER JOIN `users` ON `contacts`.`contactable_id` = `users`.`id`
            WHERE `users`.`identity_id` = 82
            AND `contacts`.`contactable_type` = 'User' ORDER BY `contacts`.`id` ASC LIMIT 1
        Completed 404 Not Found in 3ms

And some other strange features

😟

After that, our adventure quickly turned into a very scary scenario

😱

Well, at least it was only us using HEY internally 😌

But still 😟😰🔍

Where did the authenticated identity come from?

app/models/current.rb


        class Current < ActiveSupport::CurrentAttributes
          attribute :session

          delegate :identity, to: :session, allow_nil: true
        end

💡 Current Attributes

Abstract super class that provides a thread-isolated attributes singleton, which resets automatically before and after each request. Ruby on Rails API docs

💡 Current Attributes

Abstract super class that provides a thread-isolated attributes singleton, which resets automatically before and after each request. Ruby on Rails API docs

Where did the authenticated identity come from?

app/controllers/concerns/authenticate.rb


        module Authenticate
          included do
            before_action :require_session
          end

          def require_session
            resume_session || request_session
          end

          def resume_session
            Current.session ||= resume_session_by_cookie || resume_session_by_token
          end

We pushed a quick hotfix

app/controllers/concerns/authenticate.rb


          def resume_session
            Current.session ||= resume_session_by_cookie || resume_session_by_token
          end

We pushed a quick hotfix

app/controllers/concerns/authenticate.rb


          def resume_session
            Current.session = resume_session_by_cookie || resume_session_by_token
          end

Current Attributes were leaking! But how?

Something was not resetting things properly after each request...

Jeremy 🔮: "This points to the Rails executor!"

Hm... The Rails Executor... 🤔

ActiveSupport::Railtie - executor callbacks

rails/activesupport/lib/active_support/railtie.rb


        initializer "active_support.reset_execution_context" do |app|
          app.executor.to_run              { ActiveSupport::ExecutionContext.clear }
          app.executor.to_complete         { ActiveSupport::ExecutionContext.clear }
        end

        initializer "active_support.reset_all_current_attributes_instances" do |app|
          app.executor.to_run              { ActiveSupport::CurrentAttributes.reset_all }
          app.executor.to_complete         { ActiveSupport::CurrentAttributes.reset_all }

        # ...

The Rails Executor 🦸🏽

Separates application code and framework code
Wraps application code inside an executor context, effectively bracketing every execution of application code with setup and teardown logic
Defines to_run and to_complete callbacks
Provides run!/complete! or #wrap(&block) to do the actual wrapping

The Rails Executor 🦸🏽

Tracks which threads are in a safe state for autoloading and reloading
Enables and disables Active Record's query cache and makes sure it's cleared after each request
Ensures database connections checked out from Active Record's pool during a request/job are returned to the pool when the work is done
Prevents any per-request cached data from leaking from one request to the next
It's very relevant when writing gems that invoke application code (background jobs, instrumentation, non-HTTP handlers...)
And a great example of Rails magic that makes our lives easier, and that's so much fun to know in detail 🪄

Wrapping actions: Executor middleware


        $ bin/rails middleware
        use ActionDispatch::HostAuthorization
        use Rack::Sendfile
        use ActionDispatch::Static
        use ActionDispatch::Executor
        use ActionDispatch::ServerTiming
        use Rack::Runtime
        use Rack::MethodOverride
        use ActionDispatch::RequestId
        use ActionDispatch::RemoteIp
        ...

Executor middleware

rails/actionpack/lib/action_dispatch/middleware/executor.rb


        module ActionDispatch
          class Executor
            def initialize(app, executor)
              @app, @executor = app, executor
            end

            def call(env)
              state = @executor.run!
              begin
                response = @app.call(env)
                returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
              rescue => error
               @executor.error_reporter.report(error, handled: false)
               raise
              ensure
                state.complete! unless returned
              end
            end
          end
        end

We need to understand what these do:

@app, @executor = app, executor
state = @executor.run!
state.complete!

`@app, @executor = app, executor`

Rails's default middleware stack

rails/railties/lib/rails/application/default_middleware_stack.rb


        middleware.use ::ActionDispatch::Executor, app.executor # ::ActionDispatch::Executor.new(_app, app.executor)

Rails::Application

rails/railties/lib/rails/application.rb


        attr_reader :reloaders, :reloader, :executor, :autoloaders

        # ...

        def initialize(initial_variable_values = {}, &block)
          # ...
          @executor          = Class.new(ActiveSupport::Executor)
          @reloader          = Class.new(ActiveSupport::Reloader)
          @reloader.executor = @executor
          # ...
        end

ActiveSupport::Executor

rails/activesupport/lib/active_support/executor.rb


        module ActiveSupport
          class Executor < ExecutionWrapper
          end
        end

Main executor code

rails/activesupport/lib/active_support/execution_wrapper.rb


        module ActiveSupport
          class ExecutionWrapper
            include ActiveSupport::Callbacks

            Null = Object.new
            def Null.complete!
            end

            define_callbacks :run
            define_callbacks :complete

            def self.to_run(*args, &block) # app.executor.to_run { ActiveSupport::CurrentAttributes.reset_all }
              set_callback(:run, *args, &block)
            end

            def self.to_complete(*args, &block) # app.executor.to_complete { ActiveSupport::CurrentAttributes.reset_all }
              set_callback(:complete, *args, &block)
            end

            # Run this execution.
            #
            # Returns an instance, whose +complete!+ method *must* be invoked
            # after the work has been performed.
            def self.run!
              if active?
                Null
              else
                new.tap do |instance|
                  success = nil
                  begin
                    instance.run!
                    success = true
                  ensure
                    instance.complete! unless success
                  end
                end
              end
            end

            def self.active_key # app.executor.object_id, app.executor is Class.new(ActiveSupport::Executor)
              @active_key ||= :"active_execution_wrapper_#{object_id}"
            end

            def self.active?
              IsolatedExecutionState.key?(active_key)
            end

            def run!
              IsolatedExecutionState[self.class.active_key] = self
              run
            end

            def run # :nodoc:
              run_callbacks(:run)
            end

            # Complete this in-flight execution. This method *must* be called
            # exactly once on the result of any call to +run!+.
            def complete!
              complete
            ensure
              IsolatedExecutionState.delete(self.class.active_key)
            end

            def complete # :nodoc:
              run_callbacks(:complete)
            end

Main executor code - summary

rails/activesupport/lib/active_support/execution_wrapper.rb

`state = @executor.run!`

If active, returns an object that does nothing on complete!
Else, it creates and returns an instance that:
- stores executor as active for the current thread's (or fiber's) execution context
- and runs to_run callbacks
If an error happens on run, it calls complete! in the instance

Main executor code - summary

rails/activesupport/lib/active_support/execution_wrapper.rb

`state.complete!`

Runs to_complete callbacks
No matter what, removes its class (the executor) from the current thread's (or fiber's) execution context

Executor middleware

rails/actionpack/lib/action_dispatch/middleware/executor.rb


        module ActionDispatch
          class Executor
            def initialize(app, executor)
              @app, @executor = app, executor
            end

            def call(env)
              state = @executor.run!
              begin
                response = @app.call(env)
                returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
              rescue => error
               @executor.error_reporter.report(error, handled: false)
               raise
              ensure
                state.complete! unless returned
              end
            end
          end
        end

Executor middleware

rails/actionpack/lib/action_dispatch/middleware/executor.rb


        module ActionDispatch
          class Executor
            def initialize(app, executor)
              @app, @executor = app, executor
            end

            def call(env)
              state = @executor.run!
              begin
                response = @app.call(env) # response = [ status, header, body ]
                returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
              rescue => error
               @executor.error_reporter.report(error, handled: false)
               raise
              ensure
                state.complete! unless returned
              end
            end
          end
        end

_{returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }}

_{returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }}
_{response = [ status, header, body ]}
_{→ body = response.pop}
_{→ response = [ status, header ]}

_{returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }}
_{response = [ status, header, body ]}
_{body = response.pop}
_{response = [ status, header ]}

_{returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }}
_{response = [ status, header, body ]}
_{body = response.pop}
_{response = [ status, header ]}
_{→ proxied_body = ::Rack::BodyProxy.new(body, &block)}

_{returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }}
_{response = [ status, header, body ]}
_{body = response.pop}
_{response = [ status, header ]}
_{proxied_body = ::Rack::BodyProxy.new(body, &block)}
_{→ response << proxied_body}
_{→ response = [ status, header, proxied_body ]}

_{returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }}
_{response = [ status, header, body ]}
_{body = response.pop}
_{response = [ status, header ]}
_{proxied_body = ::Rack::BodyProxy.new(body, &block)}
_{response << proxied_body}
_{response = [ status, header, proxied_body ]}
_{→ returned = response}

`::Rack::BodyProxy.new(body) { state.complete! }`

rack/lib/rack/body_proxy.rb


          # Proxy for response bodies allowing calling a block when
          # the response body is closed (after the response has been fully
          # sent to the client).

          class Rack::BodyProxy
            def initialize(body, &block)
              @body = body
              @block = block
              @closed = false
            end

            def close
              return if @closed
              @closed = true
              begin
                @body.close if @body.respond_to?(:close)
              ensure
                @block.call
              end
            end
          end

Executor middleware

rails/actionpack/lib/action_dispatch/middleware/executor.rb


        def call(env)
          state = @executor.run!
          begin
            response = @app.call(env) # response = [ status, header, body ]
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end

Executor middleware - summary

rails/actionpack/lib/action_dispatch/middleware/executor.rb

Receives the app's executor for the current thread when initialized (@executor)
Calls state = @executor.run! before invoking the next middleware
Calls the next middleware, takes the response body, and wraps it with ::Rack::BodyProxy
Passes state.complete! as a block that will run when the server calls #close in the response body
Handles any errors on the above, reporting and calling state.complete! if something went wrong

Remeber the callbacks defined by Rails


        initializer "active_support.reset_execution_context" do |app|
          app.executor.to_run              { ActiveSupport::ExecutionContext.clear }
          app.executor.to_complete         { ActiveSupport::ExecutionContext.clear }
        end

        initializer "active_support.reset_all_current_attributes_instances" do |app|
          app.executor.to_run              { ActiveSupport::CurrentAttributes.reset_all }
          app.executor.to_complete         { ActiveSupport::CurrentAttributes.reset_all }

        # ...

❗ If `run!` returns an active (not complete) executor, `to_run` callback wouldn't run

And calling `complete!` over the return value would do nothing

❗ And if `complete!` is not called over the current executor, the `to_complete` callback wouldn't run

First request


        def call(env)
          state = @executor.run!
          begin
            response = @app.call(env)
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end

First request


        def call(env)
          → state = @executor.run! # Class object_id: 16536, current attributes reset
          begin
            response = @app.call(env)
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end

First request


        def call(env)
          state = @executor.run!
          begin
            response = @app.call(env)
            → returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end

First request


        def call(env)
          state = @executor.run!
          begin
            response = @app.call(env)
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            → state.complete! unless returned
          end
        end

First request


        def call(env)
          state = @executor.run!
          begin
            response = @app.call(env) # response = [ status, header, body ]
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            → state.complete! unless returned
            # returned = [ status, headers, proxied_body ]
          end
        end

Subsequent requests


        def call(env)
          → state = @executor.run! # ActiveSupport::ExecutionWrapper::Null, no reset
          begin
            response = @app.call(env) # response = [ status, header, body ]
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end

Subsequent requests


        def call(env)
          state = @executor.run! # ActiveSupport::ExecutionWrapper::Null, no reset
          begin
            response = @app.call(env) # response = [ status, header, body ]
            → returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end

But how/why could this happen? 🔍

Chasing the bug with good old exceptions

app/models/concerns/current_request_pinning.rb


        module CurrentRequestPinning
          extend ActiveSupport::Concern

          prepended do
            attribute :request_id
          end

          def request_id=(current_request_id)
            if request_id
              raise "Current.request_id is already pinned. Pinned: #{request_id} New: #{current_request_id}"
            else
              super
            end
          end
        end

Chasing the bug with good old exceptions

app/models/current.rb


        class Current < ActiveSupport::CurrentAttributes
          prepend CurrentRequestPinning

          attribute :session

Chasing the bug with good old exceptions

config/initializers/guard_against_current_leakage.rb


        ActiveSupport.on_load :action_controller do
          before_action do
            Current.request_id = request.request_id # ActionDispatch::RequestId, X-Request-Id header
          end
        end

⏳ And then, we waited...

... until an exception happened! 💥


        RuntimeError (Current.request_id is already pinned.
          Pinned: 6be33aff-d56d-429c-a934-2cb076c29979 New: f558cd87-9d8b-4d91-9313-892aef106ce7)

        app/models/concerns/current_request_pinning.rb:22:in 'CurrentRequestPinning#guard_request_id_reassignment'
        app/models/concerns/current_request_pinning.rb:10:in 'CurrentRequestPinning#request_id='
        config/initializers/set_current_request_details.rb:5:in 'block (2 levels)'

In our logs, we saw:

A 200 response from the Rails app


          Started GET "/topics/1020541119"
          Processing by TopicsController#show as HTML
            Parameters: {"id" => "1020541119"}
            ...
          Completed 200 OK in 197ms

But a 500 error logged by Nginx!


          "GET /topics/1020541119 HTTP/1.1" 500

Puma's low level error handler

config/puma.rb


        lowlevel_error_handler do |exception, env|
          Sentry.with_scope do |scope|
            scope.set_transaction_name "puma"
            scope.set_contexts puma: { env: env }
            Sentry.capture_exception exception, level: :warning
          end

          [ 500, { "Content-Type" => "text/html" }, Rails.root.join("public/500.html").open ]
        end

The error pointed to a mysterious Critter::UnicornQueueTiming middleware that wasn't being listed in our middleware stack 😕

config.ru


        # This file is used by Rack-based servers to start the application.
        require_relative 'config/environment'

        # Not Unicorn-specific. Tracks delay since the X-Queue-Start header set by
        # the earliest load balancer that accepts HTTP requests.
        use Critter::UnicornQueueTiming if defined? Critter

        run Rails.application


        --- Puma ---
        use Critter::UnicornQueueTiming 💥
        use Rack::Sendfile
        use ActionDispatch::Static
        use ActionDispatch::Executor ⬅️
        ...

The mysterious Critter::UnicornQueueTiming middleware wasn't thread-safe and was crashing from time to time 🤦🏻‍♀️

The immediate fix was easy

config.ru


        # Not Unicorn-specific. Tracks delay since the X-Queue-Start header set by
        # the earliest load balancer that accepts HTTP requests.
        use Critter::UnicornQueueTiming if defined? Critter

        run Rails.application

The immediate fix was easy

config.ru


        # Not Unicorn-specific. Tracks delay since the X-Queue-Start header set by
        # the earliest load balancer that accepts HTTP requests.
        #
        # *cough* Well, it does assume a single-threaded Rack adapter. Disabled
        # pending thread-safe implementation.
        # use Critter::UnicornQueueTiming if defined? Critter

        run Rails.application

Narrator: they never did the thread-safe implementation 😆

A resilient executor

config/initializers/guard_against_current_leakage.rb


        class GuardedExecutor < ActionDispatch::Executor
          class ExecutorStateViolation < RuntimeError; end

          def call(env)
            guard(env) { super }
          end

          private
            def guard(env)
              check! :before

              response = yield

              # Knotty syntax. Match superclass implementation.
              returned = response << ::Rack::BodyProxy.new(response.pop) { check! :body_close }
            rescue Exception
              check! :rescued
              raise
            end

            def check!(whence)
              if @executor.active?
                raise ExecutorStateViolation, whence
              end
            end
        end

        Rails.application.configure do
          config.middleware.swap ActionDispatch::Executor, GuardedExecutor, executor
        end

🥳

The End?

Well, not quite...

We still had to upstream all this to Rails

... but life had other plans

Some time later, in Puma...

🫣

A change in Puma::Request#handle_request inadvertently made a call to res_body.close to not always happen.

Yes, that res_body:


        returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }

[CVE-2022-23633]

And finally, the fix in Rails

ActiveSupport::Executor#run!

rails/activesupport/lib/active_support/execution_wrapper.rb


        def self.run!(reset: false)
          if reset
            lost_instance = IsolatedExecutionState.delete(active_key)
            lost_instance&.complete!
          else
            return Null if active?
          end

          new.tap do |instance|
            success = nil
            begin
              instance.run!
              success = true
            ensure
              instance.complete! unless success
            end
          end
        end

And finally, the fix in Rails

Executor middleware

rails/actionpack/lib/action_dispatch/middleware/executor.rb


        module ActionDispatch
          class Executor
            def initialize(app, executor)
              @app, @executor = app, executor
            end

            def call(env)
              state = @executor.run!(reset: true)
              begin
                response = @app.call(env)
                returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
              rescue => error
               @executor.error_reporter.report(error, handled: false)
               raise
              ensure
                state.complete! unless returned
              end
            end
          end
        end

🇧🇷🍹🤗💜

👋🏻 Hi, I'm Rosa!

The curious incident of the incomplete executor 🕵🏻‍♀️

in the night time*

Through this story:

It all started with a 404 error in our HEY email app...

A 304 (correct) response looked like this:

But the 404 response had no authentication and a strange identity_id

😟

After that, our adventure quickly turned into a very scary scenario

After that, our adventure quickly turned into a very scary scenario

After that, our adventure quickly turned into a very scary scenario

😱

Well, at least it was only us using HEY internally 😌

But still 😟😰🔍

Where did the authenticated identity come from?

💡 Current Attributes

💡 Current Attributes

Where did the authenticated identity come from?

We pushed a quick hotfix

We pushed a quick hotfix

Current Attributes were leaking! But how?

Something was not resetting things properly after each request...

Jeremy 🔮: "This points to the Rails executor!"

Hm... The Rails Executor... 🤔

ActiveSupport::Railtie - executor callbacks

The Rails Executor 🦸🏽

The Rails Executor 🦸🏽

Wrapping actions: Executor middleware

Executor middleware

We need to understand what these do:

@app, @executor = app, executor

Rails's default middleware stack

Rails::Application

ActiveSupport::Executor

Main executor code

Main executor code - summary

state = @executor.run!

Main executor code - summary

state.complete!

Executor middleware

Executor middleware

::Rack::BodyProxy.new(body) { state.complete! }

Executor middleware

Executor middleware - summary

Remeber the callbacks defined by Rails

❗ If run! returns an active (not complete) executor, to_run callback wouldn't run

And calling complete! over the return value would do nothing

❗ And if complete! is not called over the current executor, the to_complete callback wouldn't run

First request

First request

First request

First request

First request

Subsequent requests

Subsequent requests

But how/why could this happen? 🔍

Chasing the bug with good old exceptions

Chasing the bug with good old exceptions

Chasing the bug with good old exceptions

⏳ And then, we waited...

... until an exception happened! 💥

In our logs, we saw:

Puma's low level error handler

A resilient executor

🥳

The End?

Well, not quite...

Some time later, in Puma...

Some time later, in Puma...

Some time later, in Puma...

Some time later, in Puma...

🫣

[CVE-2022-23633]

[CVE-2022-23633]

And finally, the fix in Rails

ActiveSupport::Executor#run!

And finally, the fix in Rails

Executor middleware

🥳

`@app, @executor = app, executor`

`state = @executor.run!`

`state.complete!`

`::Rack::BodyProxy.new(body) { state.complete! }`

❗ If `run!` returns an active (not complete) executor, `to_run` callback wouldn't run

And calling `complete!` over the return value would do nothing

❗ And if `complete!` is not called over the current executor, the `to_complete` callback wouldn't run