How to Create a Reinforcement Learning Trading System for Crypto
Reinforcement learning (RL) represents the frontier of AI crypto trading-systems that learn optimal trading behavior through trial and error, discovering strategies humans might never conceive. Unlike supervised learning that requires labeled examples, RL agents learn by doing, improving through millions of simulated trades.
This comprehensive guide walks through building a reinforcement learning trading system for crypto markets. Whether you're developing an ai crypto trading bot or exploring how ai trading algorithms work, understanding RL fundamentals separates practitioners from theorists.
Building effective RL trading systems requires combining machine learning expertise with deep market knowledge. The potential rewards-autonomous agents that adapt to changing markets-justify the significant development effort.
What Is Reinforcement Learning for Trading?
Reinforcement learning is a machine learning paradigm where an agent learns by interacting with an environment, receiving rewards for good actions and penalties for bad ones.
The RL Framework:
Agent observes State → Agent takes Action → Environment returns Reward + New State
↑ │
└────────────────────────────┘
Applied to Trading:
| RL Concept | Trading Equivalent |
|---|---|
| Agent | Trading algorithm |
| Environment | Crypto market |
| State | Market conditions (prices, indicators, position) |
| Action | Buy, sell, hold, position size |
| Reward | Profit, risk-adjusted return, Sharpe ratio |
| Policy | Trading strategy |
| Episode | Trading period (day, week, month) |
- Why RL for Crypto Trading: Advantages:
- Discovers strategies without human bias
- Adapts to market regime changes
- Optimizes complex objectives (risk-adjusted returns)
- Handles sequential decision-making naturally
- Can incorporate many data sources
Challenges:
- Requires massive training data
- Prone to overfitting
- Difficult to interpret
- Non-stationary markets confound learning
- Reward engineering is critical
RL vs Other ML Approaches:
| Approach | Training Signal | Best For |
|---|---|---|
| Supervised | Labeled examples | Price prediction, classification |
| Unsupervised | Pattern discovery | Regime detection, clustering |
| Reinforcement | Rewards from actions | Strategy optimization, execution |
Core RL Components for Crypto Markets
Building an RL trading system requires carefully designing each component for the specific challenges of crypto markets.
Component Overview:
┌─────────────────────────────────────────────────────────────────┐
│ RL Trading System │
├─────────────────────────────────────────────────────────────────┤
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ State │ → │ Policy │ → │ Action │ │
│ │ Encoder │ │ Network │ │ Decoder │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ ↑ │ │
│ │ ┌───────────┐ │ │
│ │ │ Reward │ ↓ │
│ └──────────│ Function │←───────────── │
│ └───────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────┐│
│ │ Trading Environment (Simulator) ││
│ │ Market Data ─→ Order Execution ─→ Position Management ││
│ └────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘
Key Design Decisions:
| Component | Options | Trade-offs |
|---|---|---|
| State | Raw prices vs features | Complexity vs information |
| Actions | Discrete vs continuous | Simplicity vs flexibility |
| Reward | P&L vs Sharpe vs custom | Optimization target |
| Network | MLP vs LSTM vs Transformer | Capacity vs training |
| Algorithm | PPO vs A2C vs SAC | Stability vs sample efficiency |
State Representation: What the Agent Sees
The state representation determines what information the agent can use for decisions. Poor state design limits what the agent can learn.
State Components for Crypto Trading: 1. Price Information:
- Recent returns (1h, 4h, 24h, 7d)
- Price relative to moving averages
- Distance to recent high/low
- OHLC ratios (open/close, high/low)
- Technical Indicators:
- RSI, MACD, Bollinger Bands
- Volume indicators (OBV, VWAP)
- Volatility measures (ATR)
- Trend strength (ADX)
- Market Microstructure:
- Order book imbalance
- Spread
- Trading volume relative to average
- Funding rate
- Position Information:
- Current position size
- Entry price
- Unrealized P&L
- Time in position
- Account State:
-
Available capital
-
Margin utilization
-
Recent trade history
-
State Normalization: Raw values vary wildly in scale. Normalize for neural network efficiency:
| Feature Type | Normalization Method |
|---|---|
| Prices | Convert to returns |
| Indicators | Z-score or min-max |
| Volume | Ratio to 20-period average |
| Position | Fraction of max allowed |
| Account | Percentage of starting capital |
State Vector Example:
state = [
# Price features (normalized returns)
return_1h, # -0.02 to 0.02 typical
return_4h,
return_24h,
return_7d,
price_vs_sma20, # 0.95 to 1.05 typical
price_vs_sma50,
# Technical indicators (z-scored)
rsi_zscore, # -2 to 2
macd_zscore,
bbwidth_zscore,
# Market structure
volume_ratio, # 0.5 to 2.0
funding_rate, # -0.001 to 0.001
oi_change, # -0.1 to 0.1
# Position info
position_pct, # -1 to 1
unrealized_pnl, # -0.1 to 0.1
time_in_position, # 0 to 1
# Account
capital_pct, # 0.8 to 1.2
]
## Total: ~17 features
- Observation Window: For temporal patterns, include historical states:
- Stacked observations: Last 24 hours of hourly states
- LSTM encoding: Let network learn temporal relationships
- Attention mechanisms: Weight important historical moments
Action Spaces: What the Agent Can Do
The action space defines what decisions the agent can make. Simpler spaces learn faster but may limit strategy complexity.
- Discrete Action Spaces: Simplest approach-agent chooses from fixed options:
## Option 1: Basic three actions
actions = ['HOLD', 'BUY', 'SELL']
## Option 2: Position-based
actions = ['FLAT', 'LONG', 'SHORT']
## Option 3: Position sizes
actions = [
'FLAT', # 0%
'SMALL_LONG', # 25%
'MEDIUM_LONG', # 50%
'LARGE_LONG', # 100%
'SMALL_SHORT', # -25%
'MEDIUM_SHORT', # -50%
'LARGE_SHORT', # -100%
]
- Continuous Action Spaces: Agent outputs exact position size:
## Action is a continuous value
action = model.predict(state) # Returns value in [-1, 1]
## Convert to position
position_size = action * max_position
Discrete vs Continuous Trade-offs:
| Factor | Discrete | Continuous |
|---|---|---|
| Learning speed | Faster | Slower |
| Strategy flexibility | Limited | Full |
| Exploration | Easy (random choice) | Requires noise |
| Implementation | Simpler | Complex |
| Algorithms | DQN, PPO | SAC, TD3, PPO |
- Action Masking: Prevent invalid actions:
def mask_actions(state, available_actions):
"""Mask out impossible actions"""
mask = np.ones(len(actions))
if state['position'] == 0:
mask[actions.index('CLOSE')] = 0 # Can't close no position
if state['capital'] < min_order:
mask[actions.index('BUY')] = 0 # Can't buy without capital
return mask
- Recommended Starting Point: For most crypto RL projects, start with discrete actions (7-11 choices covering position sizes). Graduate to continuous after validating discrete works.
Reward Function Design: What Success Means
The reward function is arguably the most critical component-it defines what the agent optimizes for.
Common Reward Approaches: 1. Simple P&L Reward:
reward = pnl_this_step
- Pro: Simple, aligned with profit
- Con: Ignores risk, encourages gambling
- Risk-Adjusted Reward:
reward = pnl_this_step - risk_penalty * volatility
- Pro: Discourages excessive risk
- Con: May be too conservative
- Sharpe-Based Reward:
## Calculated over rolling window
sharpe = mean(returns) / std(returns)
reward = sharpe_improvement
- Pro: Industry-standard metric
- Con: Sensitive to window length, can be unstable
- Differential Sharpe Ratio:
## From Moody & Saffell (2001)
dsr = (delta_mean - 0.5 * current_sharpe * delta_variance) / std
reward = dsr
- Pro: Smooth, online computation
- Con: Complex implementation
Reward Engineering Considerations:
| Consideration | Approach |
|---|---|
| Transaction costs | Subtract from reward |
| Holding costs | Penalize long position durations |
| Drawdown | Penalty for underwater periods |
| Action frequency | Penalize excessive trading |
| Market impact | Estimate and penalize large orders |
Sample Reward Function:
def calculate_reward(state, action, next_state, done):
# Base reward: P&L
pnl = next_state['portfolio_value'] - state['portfolio_value']
# Transaction cost penalty
if action != 'HOLD':
pnl -= transaction_cost
# Risk penalty (scaled by volatility)
volatility = state['recent_volatility']
risk_penalty = 0.1 * abs(state['position']) * volatility
# Drawdown penalty
if next_state['portfolio_value'] < state['peak_value']:
drawdown = (state['peak_value'] - next_state['portfolio_value']) / state['peak_value']
drawdown_penalty = 0.5 * drawdown
else:
drawdown_penalty = 0
# Combine
reward = pnl - risk_penalty - drawdown_penalty
return reward
Reward Shaping Pitfalls:
- Too sparse rewards (only at episode end) → slow learning
- Too dense rewards → agent games intermediate metrics
- Imbalanced components → agent optimizes only strongest signal
- Inconsistent with actual goals → agent does wrong thing well
Policy Networks and Architectures
The policy network maps states to actions. Architecture choice significantly impacts learning ability.
Basic MLP Architecture:
class Policy Network(nn.
Module):
def __init__(self, state_dim, action_dim):
super().__init__()
self.network = nn.
Sequential(
nn.
Linear(state_dim, 256),
nn.
ReLU(),
nn.
Linear(256, 128),
nn.
ReLU(),
nn.
Linear(128, 64),
nn.
ReLU(),
nn.
Linear(64, action_dim),
nn.
Softmax(dim=-1) # For discrete actions
)
def forward(self, state):
return self.network(state)
LSTM for Sequential States:
class LSTM Policy(nn.
Module):
def __init__(self, state_dim, action_dim, seq_len=24):
super().__init__()
self.lstm = nn.LSTM(state_dim, 128, num_layers=2, batch_first=True)
self.fc = nn.
Sequential(
nn.
Linear(128, 64),
nn.
ReLU(),
nn.
Linear(64, action_dim),
nn.
Softmax(dim=-1)
)
def forward(self, state_sequence):
lstm_out, _ = self.lstm(state_sequence)
last_output = lstm_out[:, -1, :] # Use final hidden state
return self.fc(last_output)
Actor-Critic Architecture:
Most modern RL algorithms use separate actor (policy) and critic (value) networks:
class Actor Critic(nn.
Module):
def __init__(self, state_dim, action_dim):
super().__init__()
# Shared feature extraction
self.shared = nn.
Sequential(
nn.
Linear(state_dim, 256),
nn.
ReLU(),
nn.
Linear(256, 128),
nn.
ReLU()
)
# Actor head (policy)
self.actor = nn.
Sequential(
nn.
Linear(128, 64),
nn.
ReLU(),
nn.
Linear(64, action_dim),
nn.
Softmax(dim=-1)
)
# Critic head (value function)
self.critic = nn.
Sequential(
nn.
Linear(128, 64),
nn.
ReLU(),
nn.
Linear(64, 1)
)
def forward(self, state):
features = self.shared(state)
action_probs = self.actor(features)
value = self.critic(features)
return action_probs, value
Architecture Recommendations:
| Data Type | Recommended Architecture |
|---|---|
| Fixed-length features | MLP (start here) |
| Sequential observations | LSTM or Transformer |
| Multiple timeframes | Multi-input MLP |
| Image-like data (charts) | CNN + MLP |
| Complex dependencies | Attention mechanisms |
Hyperparameter Guidelines:
| Hyperparameter | Suggested Range |
|---|---|
| Hidden layers | 2-4 |
| Hidden units | 64-512 |
| Learning rate | 1e-5 to 1e-3 |
| Batch size | 64-512 |
| Discount factor (γ) | 0.95-0.99 |
| Entropy coefficient | 0.001-0.01 |
Training Environments and Simulation
RL agents need environments to interact with. For trading, this means realistic market simulation.
Environment Interface (Gym-style):
class Crypto Trading Env:
def __init__(self, data, initial_capital=10000):
self.data = data
self.initial_capital = initial_capital
self.reset()
def reset(self):
"""Start new episode"""
self.step_idx = self.window_size
self.capital = self.initial_capital
self.position = 0
self.entry_price = 0
return self._get_state()
def step(self, action):
"""Execute action, return new state, reward, done"""
# Execute trade
self._execute_action(action)
# Move to next timestep
self.step_idx += 1
# Calculate reward
reward = self._calculate_reward()
# Check if episode done
done = self.step_idx >= len(self.data) - 1
return self._get_state(), reward, done, {}
def _get_state(self):
"""Construct state from current market data"""
# Implementation details...
pass
def _execute_action(self, action):
"""Execute buy/sell/hold"""
# Implementation with transaction costs...
pass
def _calculate_reward(self):
"""Calculate step reward"""
# Implementation...
pass
Environment Realism Considerations:
| Factor | Naive Implementation | Realistic Implementation |
|---|---|---|
| Transaction costs | Ignored | Maker/taker fees, spread |
| Slippage | Ignored | Size-dependent slippage model |
| Market impact | Ignored | Temporary and permanent impact |
| Fill probability | Always fills | Partial fills, rejections |
| Latency | Instant | Realistic delays |
| Data | Perfect hindsight | Point-in-time only |
Data Management:
## Train/validation/test split for RL
total_data = load_historical_data() # 5 years
train_data = total_data[:int(0.7*len(total_data))] # 2020-2023
val_data = total_data[int(0.7*len(total_data)):int(0.85*len(total_data))] # 2024 H1
test_data = total_data[int(0.85*len(total_data)):] # 2024 H2
## Create environments
train_env = Crypto Trading Env(train_data)
val_env = Crypto Trading Env(val_data)
test_env = Crypto Trading Env(test_data)
Episode Design:
| Approach | Pros | Cons |
|---|---|---|
| Fixed length (1 week) | Consistent, many episodes | May miss long-term patterns |
| Variable length (until drawdown) | Realistic failure mode | Episode length varies |
| Full dataset (1 episode) | Captures everything | Sparse rewards, slow training |
| Random start points | More episode variety | May overlap train/val |
Common RL Algorithms for Trading
Algorithm Comparison:
| Algorithm | Type | Sample Efficiency | Stability | Best For |
|---|---|---|---|---|
| DQN | Value-based | Moderate | Good | Discrete actions |
| PPO | Policy gradient | Good | Excellent | General purpose |
| A2C/A3C | Policy gradient | Moderate | Good | Parallel training |
| SAC | Actor-critic | Excellent | Good | Continuous actions |
| TD3 | Actor-critic | Excellent | Very good | Continuous actions |
Recommended: PPO (Proximal Policy Optimization)
PPO is the workhorse of modern RL, offering good sample efficiency and stable training:
from stable_baselines3 import PPO
## Create environment
env = Crypto Trading Env(train_data)
## Initialize PPO agent
model = PPO(
"Mlp Policy",
env,
learning_rate=3e-4,
n_steps=2048,
batch_size=64,
n_epochs=10,
gamma=0.99,
gae_lambda=0.95,
clip_range=0.2,
ent_coef=0.01,
verbose=1
)
## Train
model.learn(total_timesteps=1_000_000)
DQN for Discrete Actions:
from stable_baselines3 import DQN
model = DQN(
"Mlp Policy",
env,
learning_rate=1e-4,
buffer_size=100000,
learning_starts=10000,
batch_size=32,
tau=0.005,
gamma=0.99,
exploration_fraction=0.1,
exploration_final_eps=0.02,
verbose=1
)
SAC for Continuous Actions:
from stable_baselines3 import SAC
model = SAC(
"Mlp Policy",
env, # Must support continuous actions
learning_rate=3e-4,
buffer_size=100000,
learning_starts=10000,
batch_size=256,
tau=0.005,
gamma=0.99,
ent_coef='auto',
verbose=1
)
Training Tips:
- Start with PPO: Most stable, good baseline
- Use curriculum learning: Start with simpler markets (trending), progress to harder (ranging, volatile)
- Reward normalization: Normalize rewards during training for stability
- Gradient clipping: Prevent exploding gradients
- Logging everything: Track rewards, actions, losses for debugging
Practical Implementation Guide
Step-by-Step Development Process: Phase 1: Environment Development (2-4 weeks)
- Collect and clean historical data
- Implement basic environment (state, action, reward)
- Add realistic transaction costs
- Validate environment logic with random agent
- Implement visualization for debugging
Phase 2: Initial Training (2-3 weeks)
- Start with simple MLP policy
- Train PPO on subset of data
- Analyze learning curves
- Debug reward function if agent doesn't learn
- Iterate on state representation
Phase 3: Refinement (3-4 weeks)
- Add more sophisticated features
- Experiment with architectures (LSTM, attention)
- Tune hyperparameters
- Implement regime-specific training
- Add validation monitoring
Phase 4: Evaluation (2-3 weeks)
- Evaluate on held-out test data
- Compare to baselines (buy-hold, simple rules)
- Analyze failure modes
- Stress test on different market conditions
- Monte Carlo analysis
Phase 5: Production (Ongoing)
- Paper trading validation
- Gradual capital deployment
- Continuous monitoring
- Periodic retraining
Code Structure:
rl_trading/
├── data/
│ ├── loader.py # Data fetching
│ ├── preprocessing.py # Feature engineering
│ └── storage.py # Data caching
├── env/
│ ├── trading_env.py # Main environment
│ ├── rewards.py # Reward functions
│ └── utils.py # Helper functions
├── agents/
│ ├── networks.py # Neural network architectures
│ ├── ppo_agent.py # PPO implementation
│ └── utils.py # Training utilities
├── evaluation/
│ ├── metrics.py # Performance metrics
│ ├── visualization.py # Plotting
│ └── backtest.py # Backtesting
├── config/
│ └── config.yaml # Hyperparameters
├── train.py # Training script
├── evaluate.py # Evaluation script
└── paper_trade.py # Paper trading
Common Debugging Issues:
| Issue | Symptom | Solution |
|---|---|---|
| No learning | Flat rewards | Check reward function, simplify problem |
| Instability | Erratic rewards | Reduce learning rate, increase batch size |
| Overfitting | Good train, bad val | Add regularization, reduce model size |
| Always same action | Low variance | Increase entropy bonus |
| Excessive trading | High frequency | Add transaction cost penalty |
Evaluation and Production Deployment
Evaluation Metrics:
| Metric | Target | Calculation |
|---|---|---|
| Total Return | >0 | (Final - Initial) / Initial |
| Sharpe Ratio | >1.0 | Mean(returns) / Std(returns) * √252 |
| Max Drawdown | <-30% | Max(peak - trough) / peak |
| Win Rate | >45% | Profitable trades / Total trades |
| Profit Factor | >1.5 | Gross profit / Gross loss |
| Calmar Ratio | >1.0 | Annual return / Max drawdown |
- Baseline Comparisons: Always compare RL agent to simple baselines:
- Buy and Hold: Passive investment
- Random Agent: Sanity check
- Simple Rules: MA crossover, RSI strategy
- Market Returns: Benchmark index
Production Deployment Checklist:
- Out-of-sample performance validates training
- Paper trading matches backtest
- Risk limits implemented (max position, max drawdown)
- Monitoring dashboards active
- Automatic shutdown on excessive losses
- Model versioning and rollback capability
- Real-time inference latency acceptable
- Data pipeline robust and monitored
Monitoring in Production:
class Production Monitor:
def __init__(self, alert_threshold):
self.trades = []
self.daily_pnl = []
self.alert_threshold = alert_threshold
def log_trade(self, trade):
self.trades.append(trade)
self.check_alerts()
def check_alerts(self):
# Check drawdown
if self.current_drawdown > self.alert_threshold:
self.send_alert("Drawdown threshold exceeded")
self.pause_trading()
# Check win rate degradation
recent_wr = self.calculate_recent_winrate()
if recent_wr < 0.3:
self.send_alert("Win rate below threshold")
# Check for unusual behavior
if self.trades_today > self.normal_trades * 3:
self.send_alert("Unusual trading frequency")
Challenges and Limitations
Technical Challenges: 1. Non-Stationarity Markets change. Patterns that worked in 2023 may fail in 2025:
- Mitigation: Continuous retraining, regime detection, shorter lookback
- Sample Efficiency RL typically needs millions of samples, but market data is limited:
- Mitigation: Data augmentation, transfer learning, model-based RL
- Reward Hacking Agent finds unintended ways to maximize reward:
- Mitigation: Careful reward design, constraint-based RL
- Sim-to-Real Gap Simulated environment differs from live market:
- Mitigation: Realistic simulation, domain randomization
Practical Challenges:
| Challenge | Impact | Mitigation |
|---|---|---|
| Data quality | Garbage in, garbage out | Validate all data sources |
| Overfitting | Works in backtest, fails live | Rigorous validation |
| Latency | Missed opportunities | Infrastructure investment |
| Transaction costs | Eat profits | Accurate cost modeling |
| Market impact | Can't execute at desired prices | Size limits, impact models |
- When RL Trading Fails: Most RL trading projects fail. Common reasons:
- Insufficient domain knowledge: Building RL without understanding trading
- Poor reward function: Agent optimizes wrong objective
- Data issues: Lookahead bias, survivorship bias, bad data
- Overfitting: Agent memorizes history instead of learning patterns
- Unrealistic simulation: Doesn't account for real-world friction
- No monitoring: Agent degrades without detection
Realistic Expectations:
| Expectation | Reality |
|---|---|
| "Print money automatically" | Requires constant maintenance |
| "Beat the market easily" | Modest edge at best |
| "Set and forget" | Needs monitoring and retraining |
| "Works in all conditions" | Different regimes need different approaches |
FAQs
Is reinforcement learning better than supervised learning for trading?
Not necessarily "better"-different tools for different problems. Supervised learning excels at prediction (will price go up?). RL excels at decision-making (what position size given prediction uncertainty?). Best systems often combine both.
How much data do I need to train an RL trading agent?
Minimum 2-3 years of hourly data for basic validation, ideally 5+ years covering multiple market regimes. Sample efficiency varies by algorithm-SAC typically needs less data than PPO.
Can I use RL for high-frequency trading?
Theoretically yes, but practical challenges are severe: latency requirements (microseconds), data volume, and competition against well-funded HFT firms. RL is more practical for medium-frequency (minutes to hours) trading.
How do I know if my RL agent is overfitting?
If training performance greatly exceeds validation performance (>50% gap), if performance depends on specific historical sequences, or if the agent fails on simple perturbations of the data.
Should I use a pre-built RL library or build from scratch?
Use pre-built libraries (Stable Baselines3, RLlib) unless you have specific research needs. They're well-tested and save months of debugging. Custom environments, however, usually need to be built from scratch.
How long does it take to train an effective RL trading agent?
Development: 3-6 months for a working prototype. Training: Hours to days depending on complexity. Validation: 1-3 months of paper trading. Total: 6-12 months from start to live deployment with real money.
Summary: Building RL Trading Systems
Reinforcement learning for crypto trading offers powerful capabilities but requires significant expertise and effort. The key components for success include:
State Design - Create informative, normalized representations that capture market conditions without lookahead bias.
Action Space - Start with discrete actions (7-11 choices) before attempting continuous control.
Reward Engineering - Design rewards that truly capture your trading objectives, including risk-adjustment and transaction costs.
Architecture Selection - Begin with MLP policies, graduate to LSTM/Transformer for sequential patterns.
Realistic Simulation - Model transaction costs, slippage, and market impact accurately.
Algorithm Choice - PPO for stability and general use, SAC for continuous actions and sample efficiency.
Rigorous Evaluation - Compare against baselines, validate out-of-sample, paper trade before live deployment.
Continuous Monitoring - Track performance, detect degradation, retrain as markets evolve.
RL trading systems require significant investment but can discover strategies beyond human conception. The technology is maturing, and the tools are increasingly accessible.
Accelerate Your AI Trading with Thrive
Building RL systems takes months. Get AI-powered trading insights today with Thrive:
✅ AI Signal Generation - Machine learning-optimized entry and exit signals
✅ Regime Detection - Know when market conditions favor your strategies
✅ Risk Management - AI-powered position sizing and stop recommendations
✅ Performance Analytics - Track and improve your trading decisions
✅ No Coding Required - Access advanced AI without building infrastructure
✅ Continuous Improvement - Models updated as markets evolve
From AI research to trading edge, instantly.


![AI Crypto Trading - The Complete Guide [2026]](/_next/image?url=%2Fblog-images%2Ffeatured_ai_crypto_trading_bots_guide_1200x675.png&w=3840&q=75&dpl=dpl_EE1jb3NVPHZGEtAvKYTEHYxKXJZT)
![Crypto Trading Signals - The Ultimate Guide [2026]](/_next/image?url=%2Fblog-images%2Ffeatured_ai_signal_providers_1200x675.png&w=3840&q=75&dpl=dpl_EE1jb3NVPHZGEtAvKYTEHYxKXJZT)