AI Skills Assessment for Product Managers

Section 1: AI Fundamentals

Understanding how AI systems work at a conceptual level

Multiple Choice - 1pt

1. What does GPT stand for in models like GPT-4?

A) General Purpose Technology

B) Generative Pre-trained Transformer

C) Graphical Processing Tool

D) Global Prediction Technique

E) I don't know

Multiple Choice - 2pts

2. Your AI customer service bot is hallucinating incorrect policy information. Which approach MOST effectively reduces hallucinations?

A) Increase the model's temperature parameter from 0.3 to 0.8 to generate more diverse and creative responses

B) Use retrieval-augmented generation (RAG) to ground responses in your verified help documentation, with explicit instructions not to guess when uncertain

C) Switch to a larger model like GPT-4 from GPT-3.5 since bigger models hallucinate less

D) Shorten prompts to under 100 words to reduce information that might confuse the model

E) I don't know

Scenario - 3pts

You're building a legal research assistant that answers questions about case law. Your database contains 50,000 court documents that are updated weekly with new rulings.

3. What's the MOST appropriate technical architecture?

A) Fine-tune GPT-4 weekly on all 50,000 documents to keep the model's knowledge current with new rulings

B) Use vector embeddings to semantically search for relevant case excerpts, then pass those excerpts to an LLM for synthesis and natural language answer generation

C) Include all 50,000 court documents in the context window of each user query so the model has complete information

D) Train a custom legal transformer model from scratch on court documents plus general legal text

E) I don't know

Multiple Choice - 2pts

4. What's the PRIMARY advantage of few-shot prompting compared to fine-tuning a model?

A) Few-shot prompting always achieves higher accuracy than fine-tuned models on specialized tasks

B) Few-shot prompting requires no training data collection, compute resources, or waiting time—just include examples directly in your prompt

C) Few-shot prompting works better for highly technical or specialized domains with unique terminology

D) Few-shot prompting produces faster inference times than fine-tuned models during production use

E) I don't know

Multiple Choice - 3pts

5. Your AI feature makes 1,000 API calls per day. Each call uses 2,000 input tokens and generates 500 output tokens. Your API charges $0.01 per 1,000 input tokens and $0.03 per 1,000 output tokens. What's your daily cost?

A) $25

B) $35

C) $50

D) $65

E) I don't know

True/False - 1pt

6. Chain-of-thought prompting improves LLM performance on complex reasoning tasks by instructing the model to show its intermediate reasoning steps.

A) TRUE

B) FALSE

C) I don't know

Multiple Choice - 2pts

7. You're deciding between using an LLM API versus fine-tuning your own model. Which factor MOST favors using a third-party API?

A) You need the absolute highest possible accuracy for your highly specialized domain with unique terminology

B) Your use case requires general capabilities like summarization, Q&A, and content generation that foundation models already handle well

C) You have 100,000+ labeled training examples specific to your domain and user feedback patterns

D) You need complete control over model behavior, updates, and data residency for compliance reasons

E) I don't know

Section 2: AI Strategy

Making strategic decisions about when and how to use AI

Scenario - 3pts

Your B2B SaaS product has 500 enterprise customers. Sales team wants an AI sales forecasting feature using machine learning, but engineering says it needs 50,000 historical deals to train effectively.

8. What's your BEST response as PM?

A) Explain that without sufficient data, any model would be unreliable and could damage sales credibility—recommend waiting until you have 10,000+ deals

B) Explore whether combining your 500 deals with external industry benchmark data, transfer learning from similar domains, or simpler statistical models can provide actionable insights

C) Build a rule-based heuristic system instead of ML since you lack training data, using sales team's expertise to define forecasting logic

D) Partner with a data provider to purchase synthetic training data that matches your deal patterns and customer profile

E) I don't know

Multiple Choice - 2pts

9. Your competitor launches "AI-powered insights" generating massive PR buzz. Your analytics show your existing rule-based insights have 95% user satisfaction. What's your move?

A) Immediately allocate engineering resources to rebuild your insights with LLMs to maintain competitive parity and marketing messaging

B) Conduct user research and competitive analysis to understand if their AI solves problems your rules don't, and whether customers are actually switching or requesting similar capabilities

C) Maintain focus on your existing approach since 95% satisfaction validates your solution—competitor's AI is likely just marketing hype without substance

D) Launch a counter-marketing campaign highlighting your proven track record and reliability compared to their unproven AI experiments

E) I don't know

Multiple Choice - 2pts

10. You're prioritizing AI investments across four validated opportunities with similar technical feasibility. Which is MOST strategically valuable?

A) Automate internal data processing that currently takes 10 engineers 15 hours/week, freeing them for higher-value work

B) Create a customer-facing feature that solves a top reason that prospects choose competitors, potentially increasing win rate 20%

C) Build an AI feature that improves an existing capability from 'good' to 'great' based on strong customer interest in surveys

D) Implement AI to replicate a competitor feature that 30% of sales prospects ask about during demos

E) I don't know

Scenario - 3pts

Your AI recommendation engine drives 30% of revenue. Data scientists report model performance degrading from 85% to 78% accuracy over 6 months. Engineering wants 8 weeks to retrain and redeploy.

11. What's your FIRST priority as PM?

A) Immediately approve the 8-week retraining—7% accuracy drop is significant and will worsen without intervention

B) Analyze whether the 78% accuracy is actually impacting key business metrics (revenue per user, engagement, retention, click-through rates) before committing to retraining

C) Investigate root causes of degradation—is it data drift, seasonal patterns, or product changes—to determine if retraining will solve the underlying issue

D) A/B test the 78% model against the original 85% model to measure real user impact before deciding on retraining investment

E) I don't know

True/False - 1pt

12. If your AI feature achieves 90% success rate across all users, it's ready to ship even if it fails for the remaining 10%.

A) TRUE

B) FALSE

C) I don't know

Multiple Choice - 3pts

You have 50,000 posts/day that must be moderated. Your AI content moderation tool catches 60% of policy violations (recall=60%) with 98% precision. Your community team can manually review 1,000 items daily.

13. What's your strategic approach to ensure effective content moderation?

A) Disable AI since 60% recall means 40% of violations get through—focus manual review on random sampling of the 50K posts for consistent coverage

B) Use AI to flag high-risk content and route it to human reviewers, allowing your 1,000 daily reviews to cover AI-identified risks rather than random sampling

C) Invest in improving AI recall to 95%+ before deploying to ensure most violations are caught automatically

D) Rely entirely on the AI system since 98% precision means it rarely makes mistakes, and accept the 40% missed violations as necessary trade-off

E) I don't know

Section 3: Hands-On Building

Practical experience with AI tools and prototyping

Multiple Choice - 2pts

14. You're testing a new RAG system that answers questions about your company's product documentation. During testing, you notice it sometimes cites information from Page 47 when the correct answer is actually on Page 12. What's the MOST likely cause?

A) The chunk size is too large, causing the retrieval to miss relevant sections

B) The embedding model isn't capturing the semantic meaning of your domain-specific terminology

C) The system is retrieving semantically similar but contextually wrong content, and needs better retrieval relevance scoring

D) The temperature parameter is too high, causing the model to hallucinate page numbers

E) I don't know

Scenario - 3pts

Your AI-powered email drafting tool uses GPT-4. In production, you're seeing high costs ($8,000/month) and slow response times (3-4 seconds). Analysis shows 70% of requests are simple formatting fixes like "make this more professional" or "add a greeting," while 30% require complex rewrites.

15. What's your BEST optimization strategy?

A) Reduce max_tokens from 1000 to 500 for all requests to cut costs in half

B) Implement a router that sends simple formatting requests to Claude Haiku and complex rewrites to GPT-4, with request classification happening first

C) Switch entirely to GPT-3.5 Turbo since it's 10x cheaper and the quality difference won't be noticeable

D) Batch all requests to process every 30 seconds instead of real-time to reduce API call overhead

E) I don't know

Multiple Choice - 2pts

16. You're writing a prompt for an AI assistant that summarizes customer support tickets. Which prompt structure is MOST effective?

A) "Summarize this ticket" followed by the ticket text

B) "You are a customer support expert. Summarize this ticket in 2-3 sentences focusing on: issue type, customer impact, and urgency level. Format: [Issue] | [Impact] | [Urgency]" with examples, followed by ticket text

C) "Please carefully read this ticket and provide a thoughtful, detailed summary that captures all the important information"

D) A 500-word essay explaining the importance of good ticket summaries, then asking for a summary

E) I don't know

True/False - 1pt

17. When using few-shot examples in your prompts, you should always provide at least 10 examples to ensure the model learns the pattern effectively.

A) TRUE

B) FALSE

C) I don't know

Multiple Choice - 3pts

18. You're building a content moderation system that flags policy violations in user posts. During evaluation, you find the system has 85% accuracy with this distribution: 90% of safe posts correctly marked safe, but only 60% of violating posts correctly flagged. What should you prioritize?

A) Increase the model's temperature to generate more varied classifications

B) Focus on improving recall (catching more violations) even if it increases false positives, since missing violations is riskier than over-flagging

C) Optimize for the 85% accuracy metric since it's already performing well

D) Reduce the training data to only clear-cut examples to improve precision

E) I don't know

Multiple Choice - 3pts

19. Your production RAG system suddenly starts giving poor answers after you upgraded from GPT-4 to GPT-4 Turbo. The retrieval is working correctly and returning relevant documents, but answers are often off-topic or outdated. What's the MOST likely root cause?

A) GPT-4 Turbo has worse reasoning capabilities than GPT-4

B) Your prompt doesn't explicitly instruct the model to only use the provided context, so it's defaulting to its training data

C) The embedding model is incompatible with GPT-4 Turbo

D) The chunk size needs to be adjusted for GPT-4 Turbo's different context window

E) I don't know

Section 4: Data & Privacy

Managing data responsibly and understanding privacy implications

Multiple Choice - 1pt

20. Your AI chatbot logs all user conversations for quality improvement. Under most data privacy regulations, what is the minimum you must do?

A) Store the logs securely with encryption

B) Inform users in your privacy policy that conversations are logged and explain the purpose

C) Delete logs after 90 days automatically

D) Get explicit consent before each conversation starts

E) I don't know

Scenario - 3pts

Your AI-powered HR tool uses employee performance reviews, salary data, and manager feedback to predict promotion readiness. A data scientist proposes including employee demographic data (age, gender, ethnicity) to "improve model accuracy" by accounting for historical patterns.

21. What's your BEST response as PM?

A) Include the demographic data since it will improve accuracy, but add a disclaimer that the model uses this information

B) Reject using demographic data as model inputs—it risks encoding historical bias and could violate discrimination laws, even if it improves technical accuracy

C) Use the demographic data but apply "fairness constraints" in the model to ensure equal outcomes across groups

D) Conduct an A/B test to see if including demographic data improves business outcomes before deciding

E) I don't know

Multiple Choice - 2pts

22. Your customer support chatbot uses OpenAI's API to answer user questions. Users sometimes share sensitive information like order numbers, email addresses, and account details in their messages. What's the MOST important consideration?

A) OpenAI automatically redacts all sensitive information before processing, so no action is needed

B) You need to understand OpenAI's data retention policy and whether conversation data is used for model training—you may need to opt out or use zero-retention APIs

C) Sensitive information is safe because API calls are encrypted in transit

D) As long as you have a privacy policy, users have consented to their data being processed by third parties

E) I don't know

True/False - 2pts

23. If you anonymize personal data by removing names and email addresses before training an AI model, the data is no longer considered "personal data" under regulations like GDPR.

A) TRUE

B) FALSE

C) I don't know

Scenario - 3pts

Your medical diagnosis AI app stores patient symptom data and AI-generated health recommendations. You're using AWS to host the application and OpenAI's API for the AI functionality. An EU resident patient requests deletion of all their data under GDPR's "right to be forgotten."

24. What must you do to fully comply?

A) Delete the patient's data from your AWS database—that's sufficient since you own that data

B) Delete from your database and request OpenAI delete any API logs containing the patient's data, but you cannot control what OpenAI does

C) Delete from your database, verify deletion from AWS backups, and confirm OpenAI's retention policy—you're responsible for ensuring complete deletion across all processors

D) Inform the patient that AI training data cannot be "unlearned," so complete deletion is technically impossible

E) I don't know

Multiple Choice - 3pts

25. Your AI product processes customer data and is based in the US. You have 50 customers in California, 30 in New York, and 20 in the EU. Your legal team says you need to conduct a Data Protection Impact Assessment (DPIA). What triggers this requirement?

A) Having any US customers requires a DPIA under CCPA

B) The EU customers trigger GDPR's DPIA requirement for high-risk AI processing, regardless of where your company is based

C) Only if you store data in EU data centers—storing in US avoids DPIA requirements

D) DPIAs are optional best practice and only required if you want GDPR certification

E) I don't know

Section 5: Product Development

Managing AI product development effectively

Multiple Choice - 1pt

26. You're planning an MVP for an AI-powered feature. Which approach BEST represents an AI product MVP strategy?

A) Build the full ML pipeline with model training, deployment, and monitoring before getting any user feedback

B) Start with a simple rule-based system or human-in-the-loop approach to validate the use case, then add ML when you have data and validated demand

C) Use the most powerful AI model available (like GPT-4) to ensure the best possible user experience from day one

D) Wait until you have 10,000+ data points before building anything to ensure quality

E) I don't know

Scenario - 3pts

You're building an AI feature that recommends personalized workout plans. Your data science team says they need 6 months and 50,000 user workout histories to train an effective machine learning model. You have 500 beta users and pressure to launch in 2 months.

27. What's your BEST product strategy?

A) Launch with a rule-based recommendation engine using fitness expert heuristics, then gradually introduce ML as you collect data

B) Wait the full 6 months to collect data—launching a poor AI experience will damage your brand

C) Use synthetic data generation to create 50,000 fake workout histories and train the ML model immediately

D) Partner with another fitness app to purchase their workout data and train your model

E) I don't know

Multiple Choice - 2pts

28. Your AI-powered customer service bot is performing well in testing but users complain it feels "robotic" and "unhelpful" in production. What's the MOST likely issue?

A) The model's temperature setting is too low, making responses too deterministic

B) Your test data doesn't reflect the diversity and edge cases of real user queries

C) The model is too small and needs to be upgraded to a larger version

D) Users need more time to adjust to interacting with AI instead of humans

E) I don't know

True/False - 2pts

29. When building an AI product feature, you should always prioritize achieving the highest possible model accuracy before launch, even if it delays release by several months.

A) TRUE

B) FALSE

C) I don't know

Scenario - 3pts

Your AI writing assistant has two potential features: (1) Grammar correction with 95% accuracy requiring 2 months of work, or (2) Tone adjustment (professional/casual/friendly) with 75% accuracy also requiring 2 months. User research shows equal interest in both features.

30. How should you prioritize?

A) Build grammar correction first since 95% accuracy is more impressive than 75%

B) Build tone adjustment first because it's more differentiated—competitors already offer grammar correction

C) Evaluate which feature has better unit economics (engagement impact, retention, willingness-to-pay) and which 75% accuracy threshold is "good enough" for user value

D) Build both simultaneously by splitting your engineering team

E) I don't know

Multiple Choice - 3pts

31. You're launching an AI feature that personalizes content recommendations. After launch, engagement increases 15% but you notice certain user segments (older users, non-English speakers) have 40% lower engagement than others. What's your BEST approach?

A) This is expected variation—15% overall increase is a successful launch

B) Investigate whether the AI model is performing poorly for underrepresented segments in training data, and consider segment-specific models or data collection

C) These segments are probably less tech-savvy and will naturally have lower engagement with AI features

D) Run an A/B test turning off the AI feature for these segments to see if engagement improves

E) I don't know

Section 6: Economics

Understanding financial implications of AI products

Multiple Choice - 1pt

32. Your AI feature costs $0.02 per API call. If 10,000 users each make an average of 5 calls per month, what are your monthly API costs?

A) $100

B) $1,000

C) $10,000

D) $100,000

E) I don't know

Scenario - 3pts

Your SaaS product charges $50/user/month. You're considering adding an AI feature that costs $0.10 per user per day in API costs. Marketing estimates it will increase conversion rate from 2% to 3% and reduce churn from 5% to 3% monthly.

33. What's the MOST important factor in determining if this AI feature is economically viable?

A) The $3/user/month AI cost is only 6% of revenue, so it's clearly profitable

B) Calculate the Lifetime Value (LTV) impact of both higher conversion and lower churn, minus the added AI costs, to understand true economic value

C) The conversion rate increase alone (50% improvement) justifies the investment

D) Compare your AI costs to competitors' costs to ensure you're not overpaying

E) I don't know

Multiple Choice - 2pts

34. Your AI product has two cost components: $5,000/month for model hosting infrastructure and $0.05 per prediction. Which statement is TRUE?

A) Total costs scale linearly with usage because of the per-prediction cost

B) You have both fixed costs ($5,000/month) and variable costs ($0.05/prediction), creating different unit economics at different scales

C) The $5,000 fixed cost is negligible and shouldn't impact pricing decisions

D) You should only charge users for the variable per-prediction cost

E) I don't know

True/False - 2pts

35. Switching from GPT-4 to GPT-3.5 will always reduce your AI costs by approximately 10x because GPT-3.5 is 10x cheaper per token.

A) TRUE

B) FALSE

C) I don't know

Scenario - 3pts

You're deciding between two AI approaches for document analysis: (1) Using Claude API at $3/1M input tokens with 98% accuracy, or (2) Fine-tuning an open-source model for $15,000 upfront + $500/month hosting with 95% accuracy. You process 50M tokens monthly.

36. How should you evaluate this decision?

A) Choose Claude API because 98% accuracy is better than 95%

B) Calculate total cost over 12-24 months: API = $3 × 50 = $150/month ($1,800-$3,600 total) vs Fine-tuned = $15,000 + ($500 × 12-24) = $21,000-$27,000, then evaluate if 3% accuracy difference justifies the cost savings

C) Choose fine-tuning because it gives you more control over the model

D) Always choose API solutions to avoid upfront investment

E) I don't know

Multiple Choice - 3pts

37. Your AI-powered search feature costs $20,000/month but you're unsure if it drives revenue. Which approach BEST measures the economic value of this feature?

A) Survey users asking if they value the AI search feature

B) Run an A/B test with AI search on/off, measuring impact on conversion, retention, and revenue, then calculate if incremental revenue exceeds $20,000/month

C) Measure how many users click the AI search button daily

D) Compare your search quality metrics to competitors' search features

E) I don't know

Section 7: AI Use Cases Across Industries

Applying AI across different domains and use cases

Scenario - 2pts

Your fintech company's AI fraud detection system flags 3% of transactions for manual review and catches 92% of fraud. The fraud team requests you increase the flagging rate to catch more fraud. Engineering says they can adjust the model threshold to flag 8% of transactions and catch 97% of fraud.

38. What's the key trade-off you need to evaluate before making this change?

A) Whether your fraud team has capacity to review 2.7x more transactions (from 3% to 8%), and if the operational cost of reviewing 5% more legitimate transactions exceeds the value of catching an additional 5% of fraud

B) Whether the 97% catch rate meets industry standards for fraud detection

C) Whether customers will accept longer transaction processing times

D) Whether the model can technically achieve 97% accuracy

E) I don't know

Multiple Choice - 3pts

39. Your healthcare AI suggests treatment options based on patient symptoms and medical history. Doctors report it works well for common conditions but performs poorly for patients with multiple chronic conditions. What does this indicate about your AI strategy?

A) The model needs more training data overall

B) Your training data likely underrepresents complex multi-condition cases, and your product strategy should either: restrict the AI to simple cases only, or specifically collect multi-condition training data and validate performance separately for this segment

C) The model architecture needs to be upgraded

D) Doctors need more training on how to use the AI system

E) I don't know

Scenario - 2pts

Your legal tech AI reviews M&A contracts and flags risky clauses. It achieves 94% accuracy on standard contracts but only 78% on contracts involving international jurisdictions. Your sales team wants to market it as "M&A contract review AI" without restrictions.

40. What's your responsibility as PM?

A) Market it broadly—94% overall accuracy is strong and customers will figure out the limitations

B) Clearly communicate performance varies by contract type, consider restricting to domestic contracts only until international performance improves, or position as first-pass review requiring expert verification especially for international deals

C) Don't launch until you achieve 94% accuracy on all contract types

D) Add a disclaimer in the fine print about accuracy variations

E) I don't know

Multiple Choice - 3pts

41. Your retail AI predicts demand for 50,000 SKUs across 800 stores. Finance calculated it could reduce excess inventory costs by $12M annually. Six months post-launch, you've only achieved $2M in savings. What's the most likely strategic oversight?

A) The AI model accuracy wasn't high enough

B) You didn't account for the "last mile" problem—even with perfect predictions, buyers must trust and act on AI recommendations, supply chain must execute faster replenishment, and stores must follow new stocking protocols. AI predictions alone don't change outcomes without operational change management

C) You needed more historical training data

D) The model needs real-time data feeds instead of daily updates

E) I don't know

True/False - 1pt

42. When implementing AI for customer support, you should expect to fully automate 80-90% of support volume within the first 6 months if the AI is well-designed.

A) TRUE

B) FALSE

C) I don't know

Scenario - 3pts

Your e-commerce AI recommends products based on browsing/purchase history. After 6 months, you notice it's highly effective for fashion (35% click-through) but barely works for electronics (8% click-through). Both categories have similar data volume.

43. What does this performance gap likely indicate about your product strategy?

A) Electronics customers are less interested in personalization

B) Fashion purchases are more pattern-driven and impulse-based (style preferences, seasonal trends), while electronics purchases are more research-driven and specification-based (need specific features, compare detailed specs). You may need different recommendation strategies: collaborative filtering for fashion, specification-based matching for electronics

C) You need more electronics training data

D) The AI model is biased toward fashion

E) I don't know

Multiple Choice - 2pts

44. Your marketing team's AI generates social media content that performs 20% better in engagement than human-written posts. However, it occasionally produces content with subtle brand voice inconsistencies that your brand manager catches in review. What's the right product approach?

A) Replace the brand manager's review since the AI performs better statistically

B) Maintain human review as essential quality control—engagement metrics don't capture brand consistency, tone, or potential PR risks that require human judgment

C) Increase AI training data to eliminate brand voice inconsistencies

D) A/B test posts with and without brand manager review to see if review impacts performance

E) I don't know

Scenario - 2pts

Your HR AI screens resumes and ranks candidates. You've ensured diverse training data and regular bias audits. However, hiring managers report they disagree with AI rankings 40% of the time and end up interviewing candidates the AI ranked lower.

45. What does this indicate about your AI strategy?

A) The AI model needs improvement since hiring managers disagree 40% of the time

B) This might be working as intended—AI surfaces candidates who might be overlooked (different backgrounds, non-traditional paths), and hiring manager disagreement indicates AI is broadening the candidate pool beyond traditional patterns. Track whether these "disagreement" candidates perform well if hired

C) You should increase AI's weighting to override hiring manager preferences

D) The AI is failing and should be shut down

E) I don't know

Multiple Choice - 1pt

46. You're considering AI for three projects: (A) Generating financial compliance reports from transaction data, (B) Personalizing learning content for students, (C) Routing customer support tickets to appropriate teams. Which generally requires the MOST human oversight?

A) Project C—routing decisions are high stakes

B) Project A—compliance reports have regulatory and legal accuracy requirements where errors create liability

C) Project B—education affects student outcomes

D) All three require equal oversight

E) I don't know

🎉 Assessment Complete!

0 / 100

Your Overall AI PM Fluency

🎯 Your Personalized Development Roadmap

💼 Connect with the Builder

This assessment was built to prototype a solution to a genuine workforce upskilling problem. If you are building products at the intersection of AI, Learning, and Workforce Strategy, I'm seeking my next full-time challenge in this space. Let's connect and discuss how to solve your next big problem.

Connect with the Builder: Lisa Lee-Prioly

AI Skills Assessment for Product Managers

How This Works

🎉 What's New in V2.0

Select a Section

Section 1: AI Fundamentals

Section 2: AI Strategy

Section 3: Hands-On Building

Section 4: Data & Privacy

Section 5: Product Development

Section 6: Economics

Section 7: AI Use Cases Across Industries

🎉 Assessment Complete!

🎯 Your Personalized Development Roadmap

💼 Connect with the Builder

AI Skills Assessment for Product Managers

How This Works

🎉 What's New in V2.0

Select a Section

Section 1: AI Fundamentals

Section 2: AI Strategy

Section 3: Hands-On Building

Section 4: Data & Privacy

Section 5: Product Development

Section 6: Economics

Section 7: AI Use Cases Across Industries

🎉 Assessment Complete!

🎯 Your Personalized Development Roadmap

💼 Connect with the Builder

📧 Email My Results