Science Gossip Talk

Filter pages

  • kosmala by kosmala

    I was wondering if you've already filtered pages for likely images. It took me 15 images of only text before I got one that had an image. Seems like many of these pages could have been automatically removed with computer vision algorithms rather than showing them to volunteers. Just wondering.


  • VVH by VVH scientist, admin

    Great question Kosmala. Quia--who answered your question about Latin terms--has also flagged this up to us during the beta test. We (Zooniverse, ConSciCom and the BHL) considered using an algorithm to filter this first tranche of data, but then thought that volunteers might enjoy going through periodicals in order. That said, after we've processed this first batch of data, we're going to run some algorithms on the next batch before we upload it. The BHL has an algorithm that they use to identify images, and Quia also has one which she has kindly offered to run for/with us (and which she's already tested!). We've decided to use this first batch of data that people are working on now as a test to see how much better the crowd performs against the algorithms.

    Rest assured that blank images are retired relatively quickly after people say they are blank, so we should be processing those out over time.

    We'll keep you guys posted on the outcome of the algorithm testing. No doubt the next batch of data will contain a high proportion of images!


  • rrpbgeek by rrpbgeek

    Hardwicke's has quite a few blanks. Just classified 10 of them, all #blank.


  • geoffrey.belknap by geoffrey.belknap scientist

    Hardwicke's has a fair number of blanks - they are typically the pages at the start of a new volume. However, the more classifications we get done the fewer blanks we should be coming across as we start weeding them out. Thanks for all your hard work on this!
