AWS Lambda Best Practices for Production
Practical tips for running AWS Lambda in production - cold starts, connection pooling, error handling, and cost optimization.
Table of Contents
Lambda is deceptively simple. Here’s what I’ve learned running it in production.
Cold Start Optimization #
Cold starts happen when Lambda creates a new execution environment. They can add 100ms to several seconds of latency.
Keep Functions Warm #
For latency-sensitive functions:
// Use provisioned concurrency for consistent performance
// Or implement a warming mechanism
func handler(ctx context.Context, event Event) error {
if event.IsWarmup {
return nil // Quick return for warming invocations
}
// Actual logic
}
Minimize Package Size #
Smaller deployment packages = faster cold starts.
# For Go, compile with optimizations
GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o bootstrap main.go
# Use UPX for further compression (optional)
upx --best bootstrap
Initialize Outside Handler #
Resources initialized outside the handler persist across invocations:
var (
db *sql.DB
client *http.Client
)
func init() {
// This runs once per cold start
db = initDatabase()
client = &http.Client{
Timeout: 10 * time.Second,
}
}
func handler(ctx context.Context, event Event) error {
// db and client are reused
return processEvent(ctx, db, event)
}
Connection Management #
Database Connections #
Don’t open a new connection per invocation:
var pool *pgxpool.Pool
func init() {
config, _ := pgxpool.ParseConfig(os.Getenv("DATABASE_URL"))
config.MaxConns = 5 // Keep it low - Lambda scales horizontally
pool, _ = pgxpool.NewWithConfig(context.Background(), config)
}
Use RDS Proxy #
For high-concurrency scenarios, RDS Proxy prevents connection exhaustion:
# serverless.yml
functions:
api:
handler: bootstrap
vpc:
securityGroupIds:
- !Ref LambdaSecurityGroup
subnetIds:
- !Ref PrivateSubnet1
environment:
DATABASE_URL: !Sub "postgres://user:pass@${RDSProxy.Endpoint}:5432/mydb"
Error Handling #
Structured Errors #
Return meaningful errors for debugging:
type LambdaError struct {
StatusCode int `json:"statusCode"`
Message string `json:"message"`
RequestID string `json:"requestId"`
}
func (e *LambdaError) Error() string {
return e.Message
}
func handler(ctx context.Context, event Event) (Response, error) {
result, err := processEvent(event)
if err != nil {
reqID := lambdacontext.LogStreamName
return Response{}, &LambdaError{
StatusCode: 500,
Message: err.Error(),
RequestID: reqID,
}
}
return result, nil
}
Dead Letter Queues #
Configure DLQ for async invocations:
functions:
processor:
handler: bootstrap
events:
- sqs:
arn: !GetAtt Queue.Arn
onError: !GetAtt DeadLetterQueue.Arn
Observability #
Structured Logging #
Use structured logs for easier querying in CloudWatch Insights:
import "go.uber.org/zap"
var logger *zap.Logger
func init() {
logger, _ = zap.NewProduction()
}
func handler(ctx context.Context, event Event) error {
logger.Info("processing event",
zap.String("eventId", event.ID),
zap.String("type", event.Type),
)
// ...
}
Query in CloudWatch Insights:
fields @timestamp, @message
| filter eventId = "abc123"
| sort @timestamp desc
X-Ray Tracing #
Enable tracing for performance insights:
import "github.com/aws/aws-xray-sdk-go/xray"
func init() {
db = xray.SQL("postgres", connectionString)
}
func handler(ctx context.Context, event Event) error {
ctx, seg := xray.BeginSubsegment(ctx, "process-event")
defer seg.Close(nil)
// ...
}
Cost Optimization #
Right-Size Memory #
Memory affects CPU allocation. Test to find optimal:
# Use AWS Lambda Power Tuning
# https://github.com/alexcasalboni/aws-lambda-power-tuning
Avoid Idle Waiting #
Don’t pay for waiting on I/O. Use async patterns:
// Bad: Sequential calls
user, _ := getUser(ctx, userID)
orders, _ := getOrders(ctx, userID)
// Good: Parallel calls
var user User
var orders []Order
var wg sync.WaitGroup
wg.Add(2)
go func() { user, _ = getUser(ctx, userID); wg.Done() }()
go func() { orders, _ = getOrders(ctx, userID); wg.Done() }()
wg.Wait()
Key Takeaways #
- Initialize resources outside the handler
- Keep deployment packages small
- Use connection pooling with low limits
- Implement structured logging
- Right-size memory based on profiling
Lambda’s simplicity is powerful, but production use requires understanding its execution model.