🇧🇷🍹🤗💜

👋🏻 Hi, I'm Rosa!


👩🏻‍💻 github.com/rosa

🌐 rosa.codes

Principal programmer at 37signals

The curious incident of the incomplete executor 🕵🏻‍♀️

in the night time*



* in some time zones
** yes, in mine 🌙🥱☕

Through this story:

We'll be using Puma as our web server

and nginx on top of it

The code shown is from Rails 7.0.x and Rack 2.2.x

(but the relevant parts are the same)

It all started with a 404 error in our HEY email app...

A 304 (correct) response looked like this:


        Started GET "/boxes/210.json" for ...
        Processing by BoxesController#show as JSON
          Parameters: {"id"=>"210"}
            Oauth::Client Load (0.7ms)  SELECT `oauth_clients`.* FROM `oauth_clients` ..
            Session Load (0.7ms)  SELECT `sessions`.* FROM `sessions`
              WHERE `sessions`.`token` = '...' AND `sessions`.`identity_id` = 4 LIMIT 1
          Authenticated Identity#4 from OAuth access token
            Identity Load (0.6ms)  SELECT `identities`.* FROM `identities` WHERE `identities`.`id` = 4 LIMIT 1
            Box Load (0.7ms)  SELECT `boxes`.* FROM `boxes` ...
            Posting Load (0.7ms)  SELECT `postings`.* FROM `postings` ...
        Completed 304 Not Modified in 13ms
      

But the 404 response had no authentication and a strange identity_id


        Started GET "/boxes/210.json" for ...
        Processing by BoxesController#show as JSON
          Parameters: {"id"=>"210"}
          CACHE Box Load (0.0ms)  SELECT `boxes`.* FROM `boxes`
            WHERE `boxes`.`identity_id` = 82
            AND `boxes`.`id` = 210 ORDER BY `boxes`.`id` ASC LIMIT 1
          CACHE Contact Load (0.0ms)  SELECT `contacts`.* FROM `contacts` INNER JOIN `users` ON `contacts`.`contactable_id` = `users`.`id`
            WHERE `users`.`identity_id` = 82
            AND `contacts`.`contactable_type` = 'User' ORDER BY `contacts`.`id` ASC LIMIT 1
        Completed 404 Not Found in 3ms
      
And some other strange features

😟

After that, our adventure quickly turned into a very scary scenario

After that, our adventure quickly turned into a very scary scenario

After that, our adventure quickly turned into a very scary scenario

😱

Well, at least it was only us using HEY internally 😌

But still 😟😰🔍

Where did the authenticated identity come from?

app/models/current.rb

        class Current < ActiveSupport::CurrentAttributes
          attribute :session

          delegate :identity, to: :session, allow_nil: true
        end
      

💡 Current Attributes

Abstract super class that provides a thread-isolated attributes singleton, which resets automatically before and after each request. Ruby on Rails API docs

💡 Current Attributes

Abstract super class that provides a thread-isolated attributes singleton, which resets automatically before and after each request. Ruby on Rails API docs

Where did the authenticated identity come from?

app/controllers/concerns/authenticate.rb

        module Authenticate
          included do
            before_action :require_session
          end

          def require_session
            resume_session || request_session
          end

          def resume_session
            Current.session ||= resume_session_by_cookie || resume_session_by_token
          end
      

We pushed a quick hotfix

app/controllers/concerns/authenticate.rb

          def resume_session
            Current.session ||= resume_session_by_cookie || resume_session_by_token
          end
      

We pushed a quick hotfix

app/controllers/concerns/authenticate.rb

          def resume_session
            Current.session = resume_session_by_cookie || resume_session_by_token
          end
      

Current Attributes were leaking! But how?

Something was not resetting things properly after each request...

Jeremy 🔮: "This points to the Rails executor!"


Hm... The Rails Executor... 🤔
ActiveSupport::Railtie - executor callbacks
rails/activesupport/lib/active_support/railtie.rb

        initializer "active_support.reset_execution_context" do |app|
          app.executor.to_run              { ActiveSupport::ExecutionContext.clear }
          app.executor.to_complete         { ActiveSupport::ExecutionContext.clear }
        end

        initializer "active_support.reset_all_current_attributes_instances" do |app|
          app.executor.to_run              { ActiveSupport::CurrentAttributes.reset_all }
          app.executor.to_complete         { ActiveSupport::CurrentAttributes.reset_all }

        # ...
      

The Rails Executor 🦸🏽

  • Separates application code and framework code
  • Wraps application code inside an executor context, effectively bracketing every execution of application code with setup and teardown logic
  • Defines to_run and to_complete callbacks
  • Provides run!/complete! or #wrap(&block) to do the actual wrapping

The Rails Executor 🦸🏽

  • Tracks which threads are in a safe state for autoloading and reloading
  • Enables and disables Active Record's query cache and makes sure it's cleared after each request
  • Ensures database connections checked out from Active Record's pool during a request/job are returned to the pool when the work is done
  • Prevents any per-request cached data from leaking from one request to the next
  • It's very relevant when writing gems that invoke application code (background jobs, instrumentation, non-HTTP handlers...)
  • And a great example of Rails magic that makes our lives easier, and that's so much fun to know in detail 🪄
Wrapping actions: Executor middleware

        $ bin/rails middleware
        use ActionDispatch::HostAuthorization
        use Rack::Sendfile
        use ActionDispatch::Static
        use ActionDispatch::Executor
        use ActionDispatch::ServerTiming
        use Rack::Runtime
        use Rack::MethodOverride
        use ActionDispatch::RequestId
        use ActionDispatch::RemoteIp
        ...
      
Executor middleware
rails/actionpack/lib/action_dispatch/middleware/executor.rb

        module ActionDispatch
          class Executor
            def initialize(app, executor)
              @app, @executor = app, executor
            end

            def call(env)
              state = @executor.run!
              begin
                response = @app.call(env)
                returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
              rescue => error
               @executor.error_reporter.report(error, handled: false)
               raise
              ensure
                state.complete! unless returned
              end
            end
          end
        end
      

We need to understand what these do:

  • @app, @executor = app, executor
  • state = @executor.run!
  • state.complete!

@app, @executor = app, executor


Rails's default middleware stack
rails/railties/lib/rails/application/default_middleware_stack.rb

        middleware.use ::ActionDispatch::Executor, app.executor # ::ActionDispatch::Executor.new(_app, app.executor)
      
Rails::Application
rails/railties/lib/rails/application.rb

        attr_reader :reloaders, :reloader, :executor, :autoloaders

        # ...

        def initialize(initial_variable_values = {}, &block)
          # ...
          @executor          = Class.new(ActiveSupport::Executor)
          @reloader          = Class.new(ActiveSupport::Reloader)
          @reloader.executor = @executor
          # ...
        end
      
ActiveSupport::Executor
rails/activesupport/lib/active_support/executor.rb

        module ActiveSupport
          class Executor < ExecutionWrapper
          end
        end
      
Main executor code
rails/activesupport/lib/active_support/execution_wrapper.rb

        module ActiveSupport
          class ExecutionWrapper
            include ActiveSupport::Callbacks

            Null = Object.new
            def Null.complete!
            end

            define_callbacks :run
            define_callbacks :complete

            def self.to_run(*args, &block) # app.executor.to_run { ActiveSupport::CurrentAttributes.reset_all }
              set_callback(:run, *args, &block)
            end

            def self.to_complete(*args, &block) # app.executor.to_complete { ActiveSupport::CurrentAttributes.reset_all }
              set_callback(:complete, *args, &block)
            end

            # Run this execution.
            #
            # Returns an instance, whose +complete!+ method *must* be invoked
            # after the work has been performed.
            def self.run!
              if active?
                Null
              else
                new.tap do |instance|
                  success = nil
                  begin
                    instance.run!
                    success = true
                  ensure
                    instance.complete! unless success
                  end
                end
              end
            end

            def self.active_key # app.executor.object_id, app.executor is Class.new(ActiveSupport::Executor)
              @active_key ||= :"active_execution_wrapper_#{object_id}"
            end

            def self.active?
              IsolatedExecutionState.key?(active_key)
            end

            def run!
              IsolatedExecutionState[self.class.active_key] = self
              run
            end

            def run # :nodoc:
              run_callbacks(:run)
            end

            # Complete this in-flight execution. This method *must* be called
            # exactly once on the result of any call to +run!+.
            def complete!
              complete
            ensure
              IsolatedExecutionState.delete(self.class.active_key)
            end

            def complete # :nodoc:
              run_callbacks(:complete)
            end
      
Main executor code - summary
rails/activesupport/lib/active_support/execution_wrapper.rb
state = @executor.run!
  • If active, returns an object that does nothing on complete!
  • Else, it creates and returns an instance that:
    • stores executor as active for the current thread's (or fiber's) execution context
    • and runs to_run callbacks
  • If an error happens on run, it calls complete! in the instance
Main executor code - summary
rails/activesupport/lib/active_support/execution_wrapper.rb
state.complete!
  • Runs to_complete callbacks
  • No matter what, removes its class (the executor) from the current thread's (or fiber's) execution context
Executor middleware
rails/actionpack/lib/action_dispatch/middleware/executor.rb

        module ActionDispatch
          class Executor
            def initialize(app, executor)
              @app, @executor = app, executor
            end

            def call(env)
              state = @executor.run!
              begin
                response = @app.call(env)
                returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
              rescue => error
               @executor.error_reporter.report(error, handled: false)
               raise
              ensure
                state.complete! unless returned
              end
            end
          end
        end
      
Executor middleware
rails/actionpack/lib/action_dispatch/middleware/executor.rb

        module ActionDispatch
          class Executor
            def initialize(app, executor)
              @app, @executor = app, executor
            end

            def call(env)
              state = @executor.run!
              begin
                response = @app.call(env) # response = [ status, header, body ]
                returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
              rescue => error
               @executor.error_reporter.report(error, handled: false)
               raise
              ensure
                state.complete! unless returned
              end
            end
          end
        end
      
returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
response = [ status, header, body ]
body = response.pop
response = [ status, header ]
returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
response = [ status, header, body ]
body = response.pop
response = [ status, header ]
returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
response = [ status, header, body ]
body = response.pop
response = [ status, header ]
proxied_body = ::Rack::BodyProxy.new(body, &block)
returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
response = [ status, header, body ]
body = response.pop
response = [ status, header ]
proxied_body = ::Rack::BodyProxy.new(body, &block)
returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
response = [ status, header, body ]
body = response.pop
response = [ status, header ]
proxied_body = ::Rack::BodyProxy.new(body, &block)
returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
response = [ status, header, body ]
body = response.pop
response = [ status, header ]
proxied_body = ::Rack::BodyProxy.new(body, &block)
response << proxied_body
response = [ status, header, proxied_body ]
returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
response = [ status, header, body ]
body = response.pop
response = [ status, header ]
proxied_body = ::Rack::BodyProxy.new(body, &block)
response << proxied_body
response = [ status, header, proxied_body ]
returned = response

::Rack::BodyProxy.new(body) { state.complete! }


rack/lib/rack/body_proxy.rb

          # Proxy for response bodies allowing calling a block when
          # the response body is closed (after the response has been fully
          # sent to the client).

          class Rack::BodyProxy
            def initialize(body, &block)
              @body = body
              @block = block
              @closed = false
            end

            def close
              return if @closed
              @closed = true
              begin
                @body.close if @body.respond_to?(:close)
              ensure
                @block.call
              end
            end
          end
        
Executor middleware
rails/actionpack/lib/action_dispatch/middleware/executor.rb

        def call(env)
          state = @executor.run!
          begin
            response = @app.call(env) # response = [ status, header, body ]
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end
      
Executor middleware - summary
rails/actionpack/lib/action_dispatch/middleware/executor.rb
  • Receives the app's executor for the current thread when initialized (@executor)
  • Calls state = @executor.run! before invoking the next middleware
  • Calls the next middleware, takes the response body, and wraps it with ::Rack::BodyProxy
  • Passes state.complete! as a block that will run when the server calls #close in the response body
  • Handles any errors on the above, reporting and calling state.complete! if something went wrong
Remeber the callbacks defined by Rails

        initializer "active_support.reset_execution_context" do |app|
          app.executor.to_run              { ActiveSupport::ExecutionContext.clear }
          app.executor.to_complete         { ActiveSupport::ExecutionContext.clear }
        end

        initializer "active_support.reset_all_current_attributes_instances" do |app|
          app.executor.to_run              { ActiveSupport::CurrentAttributes.reset_all }
          app.executor.to_complete         { ActiveSupport::CurrentAttributes.reset_all }

        # ...
      

❗ If run! returns an active (not complete) executor, to_run callback wouldn't run


And calling complete! over the return value would do nothing

❗ And if complete! is not called over the current executor, the to_complete callback wouldn't run

First request


        def call(env)
          state = @executor.run!
          begin
            response = @app.call(env)
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end
      

First request


        def call(env)
          → state = @executor.run! # Class object_id: 16536, current attributes reset
          begin
            response = @app.call(env)
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end
      

First request


        def call(env)
          state = @executor.run!
          begin
            response = @app.call(env)
            → returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end
      

First request


        def call(env)
          state = @executor.run!
          begin
            response = @app.call(env)
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            → state.complete! unless returned
          end
        end
      

First request


        def call(env)
          state = @executor.run!
          begin
            response = @app.call(env) # response = [ status, header, body ]
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            → state.complete! unless returned
            # returned = [ status, headers, proxied_body ]
          end
        end
      

Subsequent requests


        def call(env)
          → state = @executor.run! # ActiveSupport::ExecutionWrapper::Null, no reset
          begin
            response = @app.call(env) # response = [ status, header, body ]
            returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end
      

Subsequent requests


        def call(env)
          state = @executor.run! # ActiveSupport::ExecutionWrapper::Null, no reset
          begin
            response = @app.call(env) # response = [ status, header, body ]
            → returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
          rescue => error
           @executor.error_reporter.report(error, handled: false)
           raise
          ensure
            state.complete! unless returned
          end
        end
      

But how/why could this happen? 🔍

Chasing the bug with good old exceptions

app/models/concerns/current_request_pinning.rb

        module CurrentRequestPinning
          extend ActiveSupport::Concern

          prepended do
            attribute :request_id
          end

          def request_id=(current_request_id)
            if request_id
              raise "Current.request_id is already pinned. Pinned: #{request_id} New: #{current_request_id}"
            else
              super
            end
          end
        end
      

Chasing the bug with good old exceptions

app/models/current.rb

        class Current < ActiveSupport::CurrentAttributes
          prepend CurrentRequestPinning

          attribute :session
      

Chasing the bug with good old exceptions

config/initializers/guard_against_current_leakage.rb

        ActiveSupport.on_load :action_controller do
          before_action do
            Current.request_id = request.request_id # ActionDispatch::RequestId, X-Request-Id header
          end
        end
      

⏳ And then, we waited...

... until an exception happened! 💥


        RuntimeError (Current.request_id is already pinned.
          Pinned: 6be33aff-d56d-429c-a934-2cb076c29979 New: f558cd87-9d8b-4d91-9313-892aef106ce7)

        app/models/concerns/current_request_pinning.rb:22:in 'CurrentRequestPinning#guard_request_id_reassignment'
        app/models/concerns/current_request_pinning.rb:10:in 'CurrentRequestPinning#request_id='
        config/initializers/set_current_request_details.rb:5:in 'block (2 levels)'
      

In our logs, we saw:

A 200 response from the Rails app


          Started GET "/topics/1020541119"
          Processing by TopicsController#show as HTML
            Parameters: {"id" => "1020541119"}
            ...
          Completed 200 OK in 197ms
        

But a 500 error logged by Nginx!


          "GET /topics/1020541119 HTTP/1.1" 500
        
Puma's low level error handler
config/puma.rb

        lowlevel_error_handler do |exception, env|
          Sentry.with_scope do |scope|
            scope.set_transaction_name "puma"
            scope.set_contexts puma: { env: env }
            Sentry.capture_exception exception, level: :warning
          end

          [ 500, { "Content-Type" => "text/html" }, Rails.root.join("public/500.html").open ]
        end
      

The error pointed to a mysterious Critter::UnicornQueueTiming middleware that wasn't being listed in our middleware stack 😕

config.ru

        # This file is used by Rack-based servers to start the application.
        require_relative 'config/environment'

        # Not Unicorn-specific. Tracks delay since the X-Queue-Start header set by
        # the earliest load balancer that accepts HTTP requests.
        use Critter::UnicornQueueTiming if defined? Critter

        run Rails.application
      

        --- Puma ---
        use Critter::UnicornQueueTiming 💥
        use Rack::Sendfile
        use ActionDispatch::Static
        use ActionDispatch::Executor ⬅️
        ...
      

The mysterious Critter::UnicornQueueTiming middleware wasn't thread-safe and was crashing from time to time 🤦🏻‍♀️

The immediate fix was easy

config.ru

        # Not Unicorn-specific. Tracks delay since the X-Queue-Start header set by
        # the earliest load balancer that accepts HTTP requests.
        use Critter::UnicornQueueTiming if defined? Critter

        run Rails.application
      

The immediate fix was easy

config.ru

        # Not Unicorn-specific. Tracks delay since the X-Queue-Start header set by
        # the earliest load balancer that accepts HTTP requests.
        #
        # *cough* Well, it does assume a single-threaded Rack adapter. Disabled
        # pending thread-safe implementation.
        # use Critter::UnicornQueueTiming if defined? Critter

        run Rails.application
      
Narrator: they never did the thread-safe implementation 😆
A resilient executor
config/initializers/guard_against_current_leakage.rb

        class GuardedExecutor < ActionDispatch::Executor
          class ExecutorStateViolation < RuntimeError; end

          def call(env)
            guard(env) { super }
          end

          private
            def guard(env)
              check! :before

              response = yield

              # Knotty syntax. Match superclass implementation.
              returned = response << ::Rack::BodyProxy.new(response.pop) { check! :body_close }
            rescue Exception
              check! :rescued
              raise
            end

            def check!(whence)
              if @executor.active?
                raise ExecutorStateViolation, whence
              end
            end
        end

        Rails.application.configure do
          config.middleware.swap ActionDispatch::Executor, GuardedExecutor, executor
        end
      

🥳

The End?



Well, not quite...

We still had to upstream all this to Rails

... but life had other plans

Some time later, in Puma...

Some time later, in Puma...

Some time later, in Puma...

Some time later, in Puma...

🫣

A change in Puma::Request#handle_request inadvertently made a call to res_body.close to not always happen.

Yes, that res_body:


        returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
      

[CVE-2022-23633]

[CVE-2022-23633]

And finally, the fix in Rails


ActiveSupport::Executor#run!
rails/activesupport/lib/active_support/execution_wrapper.rb

        def self.run!(reset: false)
          if reset
            lost_instance = IsolatedExecutionState.delete(active_key)
            lost_instance&.complete!
          else
            return Null if active?
          end

          new.tap do |instance|
            success = nil
            begin
              instance.run!
              success = true
            ensure
              instance.complete! unless success
            end
          end
        end
      

And finally, the fix in Rails


Executor middleware
rails/actionpack/lib/action_dispatch/middleware/executor.rb

        module ActionDispatch
          class Executor
            def initialize(app, executor)
              @app, @executor = app, executor
            end

            def call(env)
              state = @executor.run!(reset: true)
              begin
                response = @app.call(env)
                returned = response << ::Rack::BodyProxy.new(response.pop) { state.complete! }
              rescue => error
               @executor.error_reporter.report(error, handled: false)
               raise
              ensure
                state.complete! unless returned
              end
            end
          end
        end
      

🥳

Now yes, The End?

Almost!

Rails Security Team


❤️🙇🏻‍♀️ Huge thanks


And Rails Core and Puma teams 🙏🏻

Now yes: The End!

Muito obrigada! 🫶🏻