md0049252

Paging: A way forward.

Blog Post created by md0049252 on Aug 16, 2019

Paging is a bit tricky with the Blackboard API, and has caused a few bugs in some of our code.  I'd like to take some time to explain why it's tricky, and to provide a way to avoid the same things I ran into.  Possibly an important note - we are still on 3400.  I'm not sure how much has changed since that version.

 

Background

Paging is a good thing - it helps the endpoint return a portion of the content, and then provides a place to go and get other content.  The problem is usually around knowing if paging will show up, and pulling data quickly.  Here are a few specific problems

 

Paging Limits

How many things should you return before you start paging? As it turns out... this question is answered differently from different endpoints:

  • Get Users : 100
  • Get Courses : 100
  • Get CourseMemberships : 200
  • Get Groups : 20
  • Get Group Memberships : 20
  • Get Column Grades : 100
  • Get Grade Columns : 200
  • Get Content Children : 200

 

So, what if you just want to get more? Most times, you can't.

  • Get Users : Paging limit cannot exceed 100.
  • Get Courses : Paging limit cannot exceed 100.
  • Get CourseMemberships: Paging limit cannot exceed 200.
  • Get Groups: Paging limit seems unbound (tried 1000)
  • Get Group Memberships : Paging limit seems unbound (tried 1000) 
  • Get Column Grades : Paging limit cannot exceed 100.
  • Get Grade Columns : Paging limit cannot exceed 200.
  • Get Content Children: Paging  limit cannot exceed 200. 

 

Slowness

So, when fetching large amounts of courses, users, or memberships - you can't just bump up the limit. You have to get results, check for the next page, and go grab that.  This can take a long time.  One way to speed that up, is to make the calls asynchronously.  That means - instead of waiting for each call to return, you send calls at the same time and then wait for them all to return and combine them back together.   This is typically harder to code around, but the time savings are fairly drastic.   

 

To pull 1000 users (or 10 calls based on paging) takes Synchronous 9 seconds, and Asynchronous 2 seconds.  (4.5x)

To pull 2000 users (or 20 calls) takes Synchronous 25 seconds, and Asynchronous 7 seconds.  (3.6x)

To pull 14000 courses takes Synchronous 4 minutes 13 seconds , and Asynchronous 20 seconds.  (12.6x)

 

Mileage can vary - but usually if you can do it async - you should.  Writing asynchronous scripts can be challenging, and it relies on being able to 'guess' on what comes next.  

 

Making paging easier

I wanted to make paging easier for those choosing to script in Python.  I've been working through how to do this, both technically, and from a usage standpoint, and came up with the following:

 

  • A single argument : limit : that lets you say exactly how many you want. 
  • A sensical default of 100 (to deal with users and courses, in particular).
  • An option to get these asynchronously or synchronously.  
  • Returned Response containing json with results key, and a paging key IF there are more to get.

 

Here's what this looks like in code: 

bb = BbRest(url, key, secret)

r = bb.GetUsers(limit = 500) #will get 500 users, with the next page included in response

r = bb.GetCourses(params={'courseId':'MATH'}, limit = 1000) #will return 1000 courses with Math in courseId.

r = bb.GetCourseMemberships(courseId='MATH-20', limit=800) #will return 800 memberships in this course.

r = bb.GetGroupMemberships(courseId='MATH-20', groupId='testGroup', limit=200) #will return 200 memberships in this group.

 

If there are ever more results than the limit, then the url for the next set of results will be included in the response. All of the above are synchronous.  Here's where things get a little crazy: to make them Asynchronous - you just add sync=False, and then await the response. 

 

r = await bb.GetUsers(limit = 500, sync=False)

r = await bb.GetCourses(params={'courseId':'MATH'}, limit = 1000, sync=False) 

r = await bb.GetCourseMemberships(courseId='MATH-20', limit=800, sync=False) 

r = await bb.GetGroupMemberships(courseId='MATH-20', groupId='testGroup', limit=200)

 

The json returned is the exact same.  You may be asking "Why would I ever want to do synchronous then?". For now - there's no good reason.  If Bb ever goes to cursor based paging, then Synchronous may be the only way to page.  Hopefully - that change will come with more flexibility on the limits. 

 

Drawbacks

All calls like this (ending in s or Children, specifically) have a default limit of 100.  That means the following will only return 100 results, even though the endpoint returns 200 by default.

 

r = bb.GetContentChildren(courseId='MATH-20',contentId='_62839_1')

 

Also - this does not reduce the number of API calls.  To get 10000 users, you need to do 100 calls. There's no way around that currently.  This library, and this solution just make the logic of it easier.  

 

One final drawback : I am flawed.  There's probably something broken.  I've tested, and I will give a special shout out to Brett Stephens, who broke a lot of things and told me about them before you had to see it.  Try it out, let me know, and I'll try to fix it. 

 

Hope this helps make paging just a little easier for everyone.

-Matt

Outcomes