Discussion of how to pare down/reorganize our genre list using existing taxonomies.

We've had a lot of discussion about how cumbersome our genre list is becoming, so I wanted to get some input on adopting a taxonomy from an outside source (i.e. The Library of Congress or similar institution), implementing subgenres and/or using tag clouds.

Thanks for beginning the discussion!

My preference is for a tag cloud but I have reservations because it could be abused by pranksters who purposely input tags with the goal of offending or making a joke.

But then a LoC system could be abused to by users simply putting the wrong tag but at least they wouldn't be able to put in non categories altogether.

I'm also in favor of either a tag cloud or adopting an existing taxonomy. I think either one will work.

Neither of them will require a request, or waiting for the approval of the request, which I think is a plus. I think existing taxonomies are usually pretty complete, so I'm not sure if they have any real downside.

I think any system is going to be open for abuse, one way or another. Tag cloud abuse can be limited by only showing the most popular tags. The rest of the tags can be either completely hidden or viewable by clicking a button. So if a single person decides to add some funny tags, it will have very little effect.

The only downside of a tag cloud is though that if people are free to enter whatever without any kind of guidance, a book can end up having several similar tags, i.e. "Children's", "Children's literature", "Children's books", and so on. Of course we could have a list of recommended terms to use that anyone could edit. It would require some community effort to keep things clean.

Similar tags may cover similar, even overlapping, concepts without necessarily being synonyms. A book on children's literature is not necessarily a children's book. For example, I have a collection of Hawthorne's stories. It includes his retelling of Greek myths for children, Tanglewood Tales. Therefore, it could be properly tagged Children's Literature. However, since it contains mostly stories for adults, it is not a children's book. In a similar vein, I've seen a collection of Oscar Wilde's fairy tales (children's literature) with an introduction discussing his sexuality (not a children's book).

I don't see any problem with overlapping tags.

How about using a tag cloud in conjunction with a stripped-down list of genres? For example, this book:
https://www.biblio.gs/book/66469-Crime-Stories-And-Other-Writings

could be entered with:
Genre: Fiction, Short Stories, Mystery

and tagged with:
pulp fiction, hard-boiled fiction, detective fiction, crime fiction, Continental Op, American mystery fiction, Black Mask

(Most of the stories are about the Continental Op, and were published in Black Mask magazine)

Hi, I haven't checked on the site for a few months so I apologize if I'm a bit behind on how it's progressed recently. I'm glad to hear this is being discussed.

One advantage to associating books with genres is someone can browse or search by genre. Would a potential downside to having an entirely freeform taxonomy lead to this type of problem (raised in a past thread)? One person tags one book "Horror" and a different person tags a different book "Terror." The two books are essentially in the same genre. However now if someone searches in the "Horror" genre, they will miss those tagged "Terror."

So although a set list of genres might be relatively inflexible, it forces people to use the same terms to group books together that really should be together.

I will admit to not being too familiar with how a tag cloud works so take that all with a grain of salt.

One person tags one book "Horror" and a different person tags a different book "Terror." The two books are essentially in the same genre.

Actually, horror and terror are different. Terror is the fear of the unknown; horror is the reaction when the monster is revealed. In Dawn of the Dead, there is a scene where the dying cop tells his partner that he'll try not to come back as a zombie. After he dies, the partner sits back and waits; that's terror. When the zombies are eating people, that's horror.

I don't think there is any need for both horror and terror as genres, but there is definitely room for both as tags.

So although a set list of genres might be relatively inflexible
Also totally unmanageable.

Keep in mind multiple people may tag the same book. So likely many books will be tagged both horror and terror, not one or the other.

You kind of missed my point there and maybe it was because I used an example that I remembered from an older thread. I'm not familiar with the difference between terror and horror so ignore that.

mirva probably addressed it better than I did anyway. One person's "Children's" is another person's "Children's Books." Or, say, "Rock 'n' Roll" vs. "Rock & Roll." It's not so much about having multiple tags on one book, but having books that should be grouped together ending up being grouped separately because we're using a freeform system. It would be like having unique entities for TS Elliot, T.S. Elliot, T. S. Elliot, etc. There needs to be just one so things can be properly grouped together.

For this reason, I like the idea of a more rigid list, and I think the way to manage it better (not perfectly) is to break it down into subgenres. I think it would take longer to implement but I think in the long run it's better than a completely freeform system. Just my 2 cents.

Maybe I can explain it this way:

A freeform system can describe a particular book with more precision.

A more rigid system groups multiple books more effectively.

What is the purpose of the genre system? I'm looking at it the second way, that's why I like the idea of having a firm list from which you have to select the best options. I understand it's not the only way to look at it though.

Are they mutually exclusive? A limited set of genres groups books effectively; a tag cloud uses terms to describe books precisely. Why can't we do both, as I suggested above.

A freeform system can describe a particular book with more precision.
It also groups books more creatively. I may link two books with a common tag using terms you would have never thought of.

Are they mutually exclusive?

If you're using a freeform tag cloud, then you get the best of the description function. If you're implementing a set list of options, you're getting the best of the grouping function only to the degree your list covers. If it's only a limited set of genres, you can only group effectively to a limited degree.

In order to get the best of both worlds, you'd have to have both a freeform tagging system and a complete (not limited) list system.

Using both has its advantages, though, and if the list system is limited and you don't get the best of both worlds, at least it draws on the strengths of both and it's admittedly more feasible / manageable. I just don't like the idea that someone's going to use, say "American mystery fiction" and someone's going to use "US mystery fiction" and then books under one won't be found when searching / browsing the other.

One of the main reasons I push for a tag cloud in combination with a small list of genres is that I am concerned that a taxonomy based on The Library of Congress system is likely to have a steep learning curve. It is likely to scare away users who aren't trained librarians. A tag cloud is easy to use by an untrained person.

I am concerned that a taxonomy based on The Library of Congress system is likely to have a steep learning curve.

The basic outline is pretty clear, and they also have a search tool:
https://www.loc.gov/catdir/cpso/lcco/
http://id.loc.gov/authorities/classification.html

There are other systems too, the LoC one isn't the only one.

But of course even the classification systems don't have absolutely everything. So if you would love to see a lot of specific keywords, then tag cloud is pretty much the only option, as any extensive list of genres and subgenres would be just pretty much the same as a library classification system.

I just don't like the idea that someone's going to use, say "American mystery fiction" and someone's going to use "US mystery fiction"

Yeah, and usually small/big letters make a difference too. Not to mention typos. So without any maintenance, things can get messy. It would definitely take some extra work to keep things neat. But like I said, we could have a list of recommended terms and tags, and of course a browsable list of all tags, which would help spotting duplicates.

Bump.

I'm going to ask the Devs if having both the Genre taxonomy and a tag cloud is feasible.

In the meantime, does anyone have a strong opinion against using the Library of Congress taxonomy if we use a taxonomy? Or suggestions for alternate taxonomies?

LoC works for me at this time since tag cloud might be too hard to launch at this early stage.

LoC works for me at this time since tag cloud might be too hard to launch at this early stage.
Given that things have to be tagged to create a tag cloud, it's probably a lot easier to implement a tag cloud early, rather than allow the number of items in the database to increase substantially before implementing the tag cloud.

I have nothing against using the LoC taxonomy.

I also do agree that if we decide to implement a tag cloud, it's good to implement it as early as possible.

As far as already-designed taxonomies, I have no preference if we were going to use one.

Already-designed taxonomies have the advantage of, well, already being built and ready to go. But would they allow for the degree of specificity we'd ideally like to have if we're aiming to build "the biggest and most comprehensive book database"? Would we be able to add to / modify the one we'd potentially use? If not, I don't think something like that would be ideal on its own.

Overall, my only real strong feeling here is that we shouldn't use something entirely freeform. mirva suggested maintenance and "a list of recommended terms and tags" but if we're going to try and impose some kind of order, then I think we should go all the way and have a firm list from which people have to chose. On Discogs, even though there are specific rules for the FTF, most people simply don't read the guidelines. I see things like "Green Vinyl" or "Black" (vinyl) in there all the time. Imagine if Discogs had a freeform genre field. (e.g. "Power-pop" "Power Pop" and "Powerpop" would be three different, unlinked styles.)

I'm coming around to something tompowers64 suggested above:

How about using a tag cloud in conjunction with a stripped-down list of genres?

My only concern here is not allowing freeform tags (again, "hard-boiled fiction" vs. "hardboiled fiction" vs. "hard-boiled" etc.). Could we draw from an already-existing taxonomy to come up with a relatively stripped-down list of genres, and then figure out some way, at least in the beginning, to vet and add tags to use within each genre?

On Discogs, even though there are specific rules for the FTF, most people simply don't read the guidelines.
Why should we use Discogs as a model here? In Discogs, we have Genres like Folk, World, and Country which management WILL NOT break up into manageable units.
figure out some way, at least in the beginning, to vet and add tags to use within each genre
If the community has to vet each tag before someone is allowed to apply a tag, the tag cloud model will collapse under its own weight. If we have to debate the merits of "hard-boiled fiction" vs. "hardboiled fiction" vs. "hard-boiled," or Christmas vs. Xmas, and on and on, nothing will ever get tagged.

Why should we use Discogs as a model here?

Yeah. We've seen how well the system in Discogs works... Considering how long Discogs has been around and that the genre/style is still very much incomplete, it's obviously not a good system.

An existing taxonomy is already built, it just needs to be implemented and that's it. The tag cloud would be created and managed by users. Both require less involvement from both the management and the users than the Discogs system.

I wish we could try the tag cloud out and see if it works or not... but it's probably not an option. :)

tompowers64, you seem to be more invested in shooting down everyone's criticism of a tag cloud than addressing its downsides. There's a problem with a freeform system that we need to resolve. Let's try and look at constructive solutions here.

Why should we use Discogs as a model here?

The current discussion is exclusively among longtime Discogs users. I have absolutely no idea why we should ignore our experience with building one database when we're beginning to build another.

In Discogs, we have Genres like Folk, World, and Country which management WILL NOT break up into manageable units.

I'm not sure how this is relevant.

If the community has to vet each tag before someone is allowed to apply a tag, the tag cloud model will collapse under its own weight. If we have to debate the merits of "hard-boiled fiction" vs. "hardboiled fiction" vs. "hard-boiled," or Christmas vs. Xmas, and on and on, nothing will ever get tagged.

I'm not suggesting "the community has to vet each tag before someone is allowed to apply a tag" so please don't put words in my mouth. I'm suggesting the complete opposite — figure out a way to vet tags that DOESN'T involve community consensus. There's no debate that needs to happen about whether to use Christmas or Xmas, somebody just needs to pick one so people aren't using both (not to mention X-mas or typos like Chrismas). If seburns doesn't have time to evaluate tags regularly, then we can delegate that responsibility to one or several of us. I think you in particular would be good at this, Adambassador and mirva as well.

The tag cloud would be created and managed by users.

I suppose one alternative to a firm list idea that I am pushing for would be a freeform field that draws from existing data. Similar to the search field on Discogs, or the Artist / LCCN / Credit fields when you're editing a release. So for example, you'd start typing "hardbo.." and it would drop down a box that would include "hard-boiled fiction"

You'd still be able to enter what you wanted (including "hardboiled" which would essentially create a dupe) but it would cut down on dupe tags.

I suppose one alternative to a firm list idea that I am pushing for would be a freeform field that draws from existing data.

That's actually a really good idea. I'm not a tag cloud expert, so I hope that's possible to do. :)

You'd still be able to enter what you wanted (including "hardboiled" which would essentially create a dupe)...
Would it? Are "hardboiled fiction" and "hardboiled" duplicates? I would argue the terms are different. My concern is that, in efforts to avoid duplicates, we will lose terms which are similar, but have slightly different meanings. Consider Hardboiled and Hardboiled Fiction. They have different scope. Hardboiled Fiction generally refers to the type of fiction which came out of magazines like Black Mask and Dime Detective.

Hardboiled refers to a tough, cynical attitude toward violence (whether real or a facade), and has been applied more broadly than "hardboiled fiction" to cover a type of literary style, including, for example, Hemingway's style. Also, there is nonfiction written in a hardboiled style. Example: Beggars of Life: A Hobo Autobiography, by Jim Tully.

If seburns doesn't have time to evaluate tags regularly, then we can delegate that responsibility to one or several of us.
Removing or editing tags should be done very carefully to avoid losing shades of meaning between similar tags.

Would it? Are "hardboiled fiction" and "hardboiled" duplicates? I would argue the terms are different.

Good lord. I don't know or care if "Hardboiled" is a shade different than "Hardboiled Fiction." It's entirely irrelevant to the discussion here. We're trying to discuss implementing a genre taxonomy here, please stop picking every example apart. Focus on the bigger picture. If it helps, imagine that we're talking about "Hardboiled" vs. "Hard-boiled."

My concern is that, in efforts to avoid duplicates, we will lose terms which are similar, but have slightly different meanings.

Given that "You'd still be able to enter what you wanted" how would a term be lost, or a user prevented from entering it?

You were just talking about the need to "evaluate tags regularly" so that, for example, people don't use both Christmas and Xmas. (Actually, why not [b]encourage[/b] people to use multiple synonyms: Christmas, Xmas? The more tags the better.)

Given that "You'd still be able to enter what you wanted" how would a term be lost, or a user prevented from entering it?
If someone's going in and evaluating tags, does that involve editing tags that have been introduced by users? If so, editing should be done carefully to avoid removing terms with particular shades of meaning (I used hardboiled as an example because it had already been introduced). If it doesn't involve editing or removing tags, then I don't understand; what does the evaluation process involve?

If someone's going in and evaluating tags, does that involve editing tags that have been introduced by users?

I think so, or at least pointing out possible duplicates and opening a discussion if needed. People are going to enter duplicates, alternate spellings, or completely incorrect terms. Those are the ones we would want to eliminate in an evaluation process, not valid tags.

Actually, why not encourage people to use multiple synonyms

The tag cloud would be huge. Why have redundant duplicate terms, where's the benefit in that? I think if the tags should be browsable, then we shouldn't have multiple tags with the exactly same meaning.

If it doesn't involve editing or removing tags, then I don't understand; what does the evaluation process involve?

When I wrote "evaluating tags regularly" I was referring to the process of adding tags to a firm list. In this system, you would ideally avoid having to edit or remove tags because they wouldn't be entered incorrectly in the first place, assuming the people allowed to add them were vetting them properly. This didn't seem to be gaining traction and as an alternative to a system along those lines, I suggested:

a freeform field that draws from existing data ... You'd still be able to enter what you wanted ... but it would cut down on dupe tags.

This would be a freeform list of genres, like artists on Discogs where you have to type something exactly the same in order to not create a duplicate entity (e.g. Gang Of Four vs. Gang Of 4). In a system like this, everybody would be able to add genres to the database instantly.

Furthermore, if each book only had a single set of genres (like on Discogs), then everybody would be able to remove genres in the same way any user could potentially remove a PAN by editing every instance of it in the database.

However, if the freeform system was a tag cloud then I'm not sure how dupe tags would be edited or removed. I think it would fall to staff (which I'm sure would be a major headache) unless it was delegated to certain regular users. If we implemented a tag cloud, this would have to be figured out eventually.

we shouldn't have multiple tags with the exactly same meaning

Definitely not. It's a database design no-no.

Let me try and sum up the ideas so far. Maybe it will help keep this discussion moving forward.

• Adopt the Library of Congress taxonomy
• Adopt the Library of Congress taxonomy and allow for amendments
• Continue with the current firm list system but add a hierarchy for improved organization and usability
• Implement a freeform tag system with one genre list per book
• Implement a freeform tag cloud system
• Implement a firm, stripped-down list of genres in conjuction with a tag cloud system

Did I miss anything?

Implement a firm, stripped-down list of genres in conjuction with a tag cloud system

Or the Library of Congress taxonomy in conjunction with a tag cloud system.
The advantage of a defined taxonomy (or stripped-down genre list) is that it defines a particular genre (or set of genres) for a particular book. The advantage of a tag cloud is that it allows links to be drawn between genres.

For example, a biography of Saladin that I've been meaning to submit might go under: Nonfiction, History, Middle East and Nonfiction, History, Crusades.

This book:
https://www.biblio.gs/book/68010-Sword-Woman-and-Other-Historical-Adventures
might go under Fiction, Historical Fiction, American Historical Fiction and Fiction, Historical Fiction, Historical Fiction--Crusades. So they might be in entirely different areas. (I'm guessing: I don't really know the LOC system). But they could be linked by tags like Crusades or Crusader States.

When I wrote "evaluating tags regularly" I was referring to the process of adding tags to a firm list.
That wouldn't work without a "Please Add this Tag" thread. By the time the tags were added, people who originally asked for them would have moved on.
I think so, or at least pointing out possible duplicates and opening a discussion if needed.
I still think that removing tags added by users has to be done with care. Even if a bad tag is added, it's in the nature of a tag cloud that it will fade over time. Some fool will enter something like "Shelf in bedroom closet." Even if we don't remove that, no one else will use it and it will eventually disappear from the cloud display.

My favorites are:
• Adopt the Library of Congress taxonomy (or any other taxonomy)
• Implement a freeform tag cloud system
• Adopt the Library of Congress taxonomy (or any other taxonomy) in conjunction with a tag cloud system

I think they all have their merits, and I think any of the three options would work.

When I wrote "evaluating tags regularly" I was referring to the process of adding tags to a firm list.

And I think the list could be automatized. When someone adds a tag, it would be automatically added to a list. Then that list could be used for various purposes.

I still think that removing tags added by users has to be done with care.

Of course, and no one is suggesting the opposite.

But I think that kind of details can be ironed out if the management decides to implement a tag cloud, and when we know what kind of tag cloud it will be, and how it will work. There's no reason to worry about it before we know the details.

I'm bumping this in hopes for more opinions, especially from the folks from the genre request thread. ;-)

Can we start with a very basic question; what is the purpose of the genre field and do we really need it at all?

To state the obvious, this is a database compiled from multiple sources. It is not the catalogue of any physical library, bookshop etc. so we don’t need a classification system to organise books on shelves.

There are genre and style fields on Discogs, but as far as I can see, they serve mainly to fuel arguments amongst users. You can’t search or browse by genre or style. The most you get is a top seven for “Most Collected” and “Most Sold This Month” and a couple of bar charts for “Releases By Year” (actually releases per decade) and “Top (five) Submitters” e.g.
https://www.discogs.com/style/doom+metal

Essentially the fields produce broad statistics, with specific details for only a handful of entries.

It has been observed that the current genre list is becoming cumbersome. But we should also bear in mind that the project is in its infancy. In theory at least, the ultimate aim must be to record every edition of every book ever published. So even if there is scope to rationalise/reduce existing genres, overall they will continue to increase. And of course the fewer genres, the more entries will be in each. Once numbers start running into thousands, searching or browsing by genre becomes impractical.

Use of tag clouds has been suggested. As I understand it, these indicate the frequency with which particular terms are used. As has been observed, freeform tags will reflect preferences of individual users and may not be particularly useful for others. We can see this in genres that have been entered so far e.g. drawing fine distinctions between certain types of fiction and lumping other works together under catch-all terms like reference or non-fiction. User-added tags could eventually run into hundreds of thousands or even millions and it could prove impractical to monitor them and maintain a recommended list. So if we go down the road of cloud tagging, what are we going to do with it?

The Library of Congress system has been suggested, but as noted could be off-putting for those who are not librarians. There would indeed be a steep learning curve with plenty to trap the unwary. Seemingly related subjects can be classified in very different places e.g. railroad transportation under social sciences at HE1001-5600 and railroad engineering and operations under Technology at TF. Even if all users were professional librarians, some differences in interpretation could be expected. Are you familiar with this site? http://classify.oclc.org/classify2/
For any given title, it shows Dewey Decimal Classification (DDC), Library of Congress Classification (LCC) , or National Library of Medicine (NLM) numbers based on holdings in libraries worldwide. It shows the most frequent classifications, but also highlights where there are differences.

The classification LCC schedules are freely available to download, but in themselves they don’t comprise a ready-made dropdown list.

On balance, I’m inclined to say drop the genre field. It will be a lot of work, it will sometimes prove contentious and I’m not clear what the benefits will be.

You can’t search or browse by genre or style.

Actually you can. :)
https://www.discogs.com/search/?style_exact=Doom+Metal

I personally use the genres/styles in Discogs mainly for two purposes: 1) to check and vote releases on genres/styles that I'm familiar with, and 2) to search for new artists/releases within a genre/style I'm interested in. I don't really use the marketplace, but they have their purpose there as well as you can probably imagine.

Login or Register to post a reply to this topic.