A game developer uses a multi-arm bandit model to recommend different in-game offers to players. The model tracks which offers players respond to and adjusts future recommendations accordingly. In MAB terms, what is the purpose of the "policy" in this context?
A financial services firm is using a neural network with multiple hidden layers to predict the likelihood of loan default (binary classification). They find that the model isn’t learning well with the logistic activation function in the hidden layers. Which activation function should they consider switching to improve learning?
A tech firm is developing a model to predict credit risk and wants to reduce testing bias while still evaluating performance. Which approach should it use?
An investment firm is setting up its model risk management framework. Who is primarily responsible for approving or rejecting models based on validation results?
A trading firm uses reinforcement learning to determine the best time to buy or sell an asset. They have partial information about the state transitions and aim to maximize long-term profits by identifying the best action in each state. Which reinforcement learning approach is most suitable for this scenario?