Skip to main content

AWS Lambda Best Practices for Production

Practical tips for running AWS Lambda in production - cold starts, connection pooling, error handling, and cost optimization.

Lambda is deceptively simple. Here’s what I’ve learned running it in production.

Cold Start Optimization #

Cold starts happen when Lambda creates a new execution environment. They can add 100ms to several seconds of latency.

Keep Functions Warm #

For latency-sensitive functions:

// Use provisioned concurrency for consistent performance
// Or implement a warming mechanism

func handler(ctx context.Context, event Event) error {
    if event.IsWarmup {
        return nil // Quick return for warming invocations
    }
    // Actual logic
}

Minimize Package Size #

Smaller deployment packages = faster cold starts.

# For Go, compile with optimizations
GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o bootstrap main.go

# Use UPX for further compression (optional)
upx --best bootstrap

Initialize Outside Handler #

Resources initialized outside the handler persist across invocations:

var (
    db     *sql.DB
    client *http.Client
)

func init() {
    // This runs once per cold start
    db = initDatabase()
    client = &http.Client{
        Timeout: 10 * time.Second,
    }
}

func handler(ctx context.Context, event Event) error {
    // db and client are reused
    return processEvent(ctx, db, event)
}

Connection Management #

Database Connections #

Don’t open a new connection per invocation:

var pool *pgxpool.Pool

func init() {
    config, _ := pgxpool.ParseConfig(os.Getenv("DATABASE_URL"))
    config.MaxConns = 5 // Keep it low - Lambda scales horizontally
    pool, _ = pgxpool.NewWithConfig(context.Background(), config)
}

Use RDS Proxy #

For high-concurrency scenarios, RDS Proxy prevents connection exhaustion:

# serverless.yml
functions:
  api:
    handler: bootstrap
    vpc:
      securityGroupIds:
        - !Ref LambdaSecurityGroup
      subnetIds:
        - !Ref PrivateSubnet1
    environment:
      DATABASE_URL: !Sub "postgres://user:pass@${RDSProxy.Endpoint}:5432/mydb"

Error Handling #

Structured Errors #

Return meaningful errors for debugging:

type LambdaError struct {
    StatusCode int    `json:"statusCode"`
    Message    string `json:"message"`
    RequestID  string `json:"requestId"`
}

func (e *LambdaError) Error() string {
    return e.Message
}

func handler(ctx context.Context, event Event) (Response, error) {
    result, err := processEvent(event)
    if err != nil {
        reqID := lambdacontext.LogStreamName
        return Response{}, &LambdaError{
            StatusCode: 500,
            Message:    err.Error(),
            RequestID:  reqID,
        }
    }
    return result, nil
}

Dead Letter Queues #

Configure DLQ for async invocations:

functions:
  processor:
    handler: bootstrap
    events:
      - sqs:
          arn: !GetAtt Queue.Arn
    onError: !GetAtt DeadLetterQueue.Arn

Observability #

Structured Logging #

Use structured logs for easier querying in CloudWatch Insights:

import "go.uber.org/zap"

var logger *zap.Logger

func init() {
    logger, _ = zap.NewProduction()
}

func handler(ctx context.Context, event Event) error {
    logger.Info("processing event",
        zap.String("eventId", event.ID),
        zap.String("type", event.Type),
    )
    // ...
}

Query in CloudWatch Insights:

fields @timestamp, @message
| filter eventId = "abc123"
| sort @timestamp desc

X-Ray Tracing #

Enable tracing for performance insights:

import "github.com/aws/aws-xray-sdk-go/xray"

func init() {
    db = xray.SQL("postgres", connectionString)
}

func handler(ctx context.Context, event Event) error {
    ctx, seg := xray.BeginSubsegment(ctx, "process-event")
    defer seg.Close(nil)
    // ...
}

Cost Optimization #

Right-Size Memory #

Memory affects CPU allocation. Test to find optimal:

# Use AWS Lambda Power Tuning
# https://github.com/alexcasalboni/aws-lambda-power-tuning

Avoid Idle Waiting #

Don’t pay for waiting on I/O. Use async patterns:

// Bad: Sequential calls
user, _ := getUser(ctx, userID)
orders, _ := getOrders(ctx, userID)

// Good: Parallel calls
var user User
var orders []Order
var wg sync.WaitGroup
wg.Add(2)
go func() { user, _ = getUser(ctx, userID); wg.Done() }()
go func() { orders, _ = getOrders(ctx, userID); wg.Done() }()
wg.Wait()

Key Takeaways #

  1. Initialize resources outside the handler
  2. Keep deployment packages small
  3. Use connection pooling with low limits
  4. Implement structured logging
  5. Right-size memory based on profiling

Lambda’s simplicity is powerful, but production use requires understanding its execution model.