fix: Ensure problem response reports include all descendant problems regardless of nesting or randomization

Open efortish opened this issue 7 months ago • 13 comments

Summary

This PR ensures that the instructor "Problem Responses" report in Open edX includes all student responses to all problems under any selected block, including those that are nested or randomized (such as those from legacy library_content blocks). Previously, the report could miss responses to problems that were not directly visible to the instructor or admin user generating the report, especially in courses using randomized content blocks or deep nesting.

In courses that use randomized content (e.g., legacy library_content blocks) or have deeply nested structures, the instructor dashboard’s problem response report was incomplete. It only included responses to problems visible in the block tree for the user generating the report (typically the admin or instructor). As a result, responses to problems served randomly to students, or problems nested in containers, were omitted from the CSV export. This led to inaccurate reporting and made it difficult for instructors to audit all student answers.

Technical Approach

Recursive Expansion:
The backend now recursively expands any block selected for reporting (not just library_content blocks) to collect all descendant blocks of type problem. This is done regardless of the nesting level or block type.
Static/Class Method:
The logic is encapsulated in a static method (resolve_problem_descendants) within the ProblemResponses class, ensuring clear code organization.
Report Generation:
When generating the report, the backend uses this method to build the list of all relevant problem usage_keys, guaranteeing that all student responses are included in the export, even for randomized or deeply nested problems.
Display Name Fallback:
The code also improves how problem titles are resolved, falling back to the modulestore if the display name is not available in the course block structure.

Impact

Instructor Reports:
Reports now accurately reflect all student responses, regardless of how problems are served or structured in the course.
No Student-Facing Changes:
The change only affects backend report generation; there is no impact on the student experience, grading, or other LMS features.
Performance:
In courses with very large or deeply nested structures, report generation may take slightly longer, but this is necessary to ensure completeness.

How to reproduce:

You must import a library with multiple questions(Using legacy content libraries).
Use the content library in a unit.
In: instructor tab --> data download:

Select the block that you want to use to generate the report.
For this scenario I created 99 users to solve the exam, each user must answer 5 questions, the csv output is supposed to have 495 + 1(labeling row) rows.
You will receive much less rows than 496 because the report will only include the responses visible to the user generating the report:

In this case I received 298 rows, there is a 39.92% of missing data.

How to test suit:

I created 100 basic users using the following script, the script must be inside your edx-platform mount folder and must be run inside the lms container

from django.contrib.auth.models import User
from common.djangoapps.student.models import UserProfile
from common.djangoapps.student.models import CourseEnrollment
from opaque_keys.edx.keys import CourseKey

course_key = CourseKey.from_string("course-v1:nau+12+2025") # Change the user key based on yours
password = "test123"
cantidad = 100

for i in range(cantidad):
    username = f"user{i}"
    email = f"user{i}@example.com"
    user = User.objects.filter(username=username).first()

    if not user:
        user = User.objects.create_user(username=username, email=email, password=password)
        print(f"User created successfully: {username}")
    else:
        print(f"User already exists: {username}")

    # Crete profile
    try:
        user_profile = user.profile
    except User.profile.RelatedObjectDoesNotExist:
        user_profile = UserProfile.objects.create(user=user, name=username)
        print(f"Profile created for: {username}")

    # Enroll
    if not CourseEnrollment.objects.filter(user=user, course_id=course_key).exists():
        CourseEnrollment.enroll(user, course_key)
        print(f"{username} is now enrolled")
    else:
        print(f"{username} is already enrolled")

Once you have your users ready, it's time to prepare the exam. To do this, you will need to import the course and content libraries, which randomize the exam's questions. Resources:

course.m5st8pv9.tar.gz library.xbwedlvv.tar.gz library.388z7lwl.tar.gz

Here is a short video to setup the libraries in the exam:

https://github.com/user-attachments/assets/5a0ac0ad-ee7a-46aa-b36c-8dc9a584d7a6

Everything is settled, now we need to run this playwright script to fill out the exam with each user, the script can be run directly by running python3 simulateexam.py from any path, please install playwright in your venv.

from playwright.sync_api import sync_playwright
import time

# Config
BASE_LOGIN_URL = "http://apps.local.edly.io:1999/authn/login"
DASHBOARD_URL = "http://apps.local.edly.io:1996/learner-dashboard/"
MFE_COURSE_URL = (
    "http://apps.local.edly.io:2000/learning/course/course-v1:nau+12+2025/block-v1:nau+12+2025+type@sequential+block@c4743947cc6748579aeff9af52846721/block-v1:nau+12+2025+type@vertical+block@90f305e9aeca4ff39bea54344986b95f") # USE HERE YOUR EXAMS URL
PASSWORD = "test123"

def simulate_exam(username, password):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context()
        page = context.new_page()

        try:
            print(f"{username}: navegating to the login...")
            page.goto(BASE_LOGIN_URL)
            page.fill('input[name="emailOrUsername"]', username)
            page.fill('input[name="password"]', password)
            page.click('button[type="submit"]')
            page.wait_for_url(DASHBOARD_URL, timeout=10000)
            print(f"{username}: login success.")

            # Go to the unit
            print(f"{username}: navegating to the unit...")
            page.goto(MFE_COURSE_URL)
            page.wait_for_load_state("networkidle")
            page.evaluate("window.scrollBy(0, document.body.scrollHeight)")
            time.sleep(3)

            # Find iframe
            try:
                page.wait_for_selector("#unit-iframe", timeout=20000)
                frame = page.frame_locator("#unit-iframe")
                print(f"{username}: iframe found.")
            except Exception as e:
                print(f"{username}: iframe not found - {e}")
                page.screenshot(path=f"{username}_no_iframe.png", full_page=True)
                return

            # Find radios in the iframe
            radios = frame.locator('input[type="radio"][value="choice_0"]')
            count = radios.count()
            print(f"{username}: found {count} radios in the iframe")
            for i in range(count):
                try:
                    radios.nth(i).click()
                except Exception as e:
                    print(f"{username}: error clicking radios {i} - {e}")

            # Find and press submit in the iframe
            submit_buttons = frame.locator('button:has-text("Submit")')
            submit_count = submit_buttons.count()
            print(f"{username}: found {submit_count} submit buttons")
            for i in range(submit_count):
                try:
                    submit_buttons.nth(i).click()
                    time.sleep(0.5)
                except Exception as e:
                    print(f"{username}: error clicking submit {i} - {e}")

            page.screenshot(path=f"{username}_success.png", full_page=True)
            print(f"{username}: exam completed successfully.")

        except Exception as e:
            print(f"{username}: error general - {e}")
            page.screenshot(path=f"{username}_error.png", full_page=True)
        finally:
            browser.close()

def simulate_batch_users():
    for i in range(1, 100):
        username = f"user{i}@example.com"
        simulate_exam(username, PASSWORD)
        print(f"{username}: waiting 3 sec to pass to the next user...\n")
        time.sleep(3)

simulate_batch_users()

This process will take time

Generate the CSV:

https://github.com/user-attachments/assets/bc80a0e1-2c57-441a-85d2-d15fec55c128

Testing

After apply the changes and repeating the process in the how to test section I received:

While the data is accurate, showing 496 of 496 expected rows, the "title" column (B) incorrectly displays "problem" across all rows. This happens because the title itself remains hidden if the question is not visible to the user who is generating the report.

That is why I propose the fallback in _build_problem_list, it will allow the CSV task to get the problem title from the modulestorewithout any problem, and the report will looks like:

So:

Verified that reports generated from the instructor dashboard now include all expected problem responses.
Confirmed that randomized problems are present in the CSV export.
Checked that the report titles are correctly populated for all problems.

May 07 '25 22:05 efortish