FAForever Forums
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Login

    In game chat dump from 631 711 replays and 23 929 players

    Scheduled Pinned Locked Moved General Discussion
    20 Posts 12 Posters 1.5k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • BlackheartB Offline
      Blackheart
      last edited by

      2 questions from an amateur:

      1. Could you upload a bigger dataset as well?
      2. Is there some webservice or database or something where I can translate userid to username easily?

      Ban Anime

      1 Reply Last reply Reply Quote 0
      • T Offline
        Tagada Balance Team
        last edited by

        The chat logs are already published in the vault so I don't see how this is any different tbh

        1 Reply Last reply Reply Quote 1
        • Brutus5000B Offline
          Brutus5000 FAF Server Admin
          last edited by

          You don't see a difference between browsing through chat logs manually and mass-profiling single users and publish the results?

          "Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
          – Benno Rice

          1 Reply Last reply Reply Quote 1
          • T Offline
            Tagada Balance Team
            last edited by

            I mean, it's just presenting public information in a more organized and readable way. It's the same difference as instead of looking at rating changes from each game in the replay vault there is a tool that shows you rating changes of each user every day and other similar tools eg. Kazbeks tool that allows you to see what map a specific user plays. Yes, in theory, it can be used in a harmful way, idk shaming someone for thrash-talking people or swearing at them or something.

            1 Reply Last reply Reply Quote 0
            • T Offline
              Tagada Balance Team
              last edited by

              Not to mention commands like top words that achieve the same thing but on a smaller scale.

              1 Reply Last reply Reply Quote 0
              • FtXCommandoF Offline
                FtXCommando
                last edited by FtXCommando

                How would FAF not be liable for some information issue here?

                They :

                1. have made replay information public for everyone
                2. have made a parser to allow you to, uh, parse this information and even included instructions on how to use it

                No idea about Europe but in the US there is a liability doctrine that doesn't let you just give a person tools, say "don't do that bad thing with the tools" and then wash your hands when you put zero effort into making it difficult to actually do said bad thing.

                The only thing FAF hasn't done is give you step-by-step instructions on how to download replays from the vault to then use the tool.

                I mean I don't get the issue in the first place, do people have legal ownership over the words they write in game or something? Wouldn't this already make the replay vault a "legal liability" unless you requested consent before publishing any replay?

                Also, "You are not allowed to analyze single person behaviors or do a social rating and publish this (or basically do anything that relates back to a single user)" isn't this essentially what moderation does? Don't report results get reported back to the person that made the report? That's a publication of the analysis of a singular person's behavior.

                1 Reply Last reply Reply Quote 1
                • N Offline
                  Nooby
                  last edited by

                  I dont mean to have caused any legal trouble here, I was just interested in some data analysis. I should probably have started with things other than text chat first and gauge a response but text chat was the easiest for me to parse and make sense of.

                  from my perspective the information is already highly available in the replay vault publicly.

                  I do understand open source data can become sensitive when massed together.

                  Perhaps we need a disclaimer that replays and all information contained are available publicly? along with name history and rating history and anything else, ect? It has always been very obvious to me that they are but to others it may not be?

                  1 Reply Last reply Reply Quote 0
                  • Fremy_SpeeddrawF Offline
                    Fremy_Speeddraw
                    last edited by

                    Legality aside there are clear morality concerns. A LOT of miscellaneous personal information is public on the internet if you try hard enough to search for it, but collecting and publishing it on public forums is not really ok. Argument that "it's already public" only stands up if you delve into technicalities. And if we had some certain moderator still active giving them this kind of idea would likely result into a mass mega ban or a big drama fest.

                    ♿ https://www.twitch.tv/petricpwnz ♿

                    Scientifically proving that Blackheart is a weeb - https://imgur.com/a/J436c | https://clips.twitch.tv/AssiduousAverageOxMikeHogu

                    1 Reply Last reply Reply Quote 5
                    • Brutus5000B Offline
                      Brutus5000 FAF Server Admin
                      last edited by

                      Everything is open assuming good will:
                      a) don't misuse the data
                      b) don't cause performance issues on the server

                      As long as everybody behaves we're good. If I see misuse I'll shut it down / make it unavailable to the public.
                      So far no lines where crossed, but I hope I made my point clear where the red lines are.

                      So @Nooby you did not cause any trouble yet. I just tried to proactively step in before things go in the wrong direction.

                      "Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
                      – Benno Rice

                      QuietJoyQ 1 Reply Last reply Reply Quote 1
                      • QuietJoyQ Offline
                        QuietJoy @Brutus5000
                        last edited by QuietJoy

                        APOLOGIES THIS POST IS IN CODE FORMAT - it was the only way I could show my post while keeping the tabination of the word & count tables.. šŸ™‚

                        Currently learning python and decided to play with and analyse this dataset just out of curiosity.
                        
                        It contains 20,405,216 words, spread across 23,929 files (representing that number of games)  for a total data size of 112+MB
                        
                        I removed the 2,760,185 non English words as I am only able to speak one language. So that's all the Russian, German etc words removed.
                        
                        So there are 17,645,031 English words remaining, let's look at these.
                        
                        None of the following proves anything, I just thought it would be interesting to have a look.
                        
                        What are the actual most commonly used words?
                        
                        
                        WORD		COUNT
                        ----------	-------
                        to		1464477
                        sent		1305978
                        mass		743675
                        energy		667411
                        you		274816
                        me		263903
                        i		245113
                        give		223129
                        gg		179505
                        can		151937
                        the		131350
                        
                        Nothing too suprising there. Let's look at some other word counts now.
                        
                        Other words of note very commonly used:
                        
                        WORD		COUNT
                        ----------	-------
                        air		92538
                        units		83868
                        lol		66198
                        unit		60286
                        t3		51683
                        need		49956
                        dont		48234
                        why		30133
                        help		25714
                        
                        
                        
                        How friendly are the games?
                        
                        WORD		COUNT
                        ----------	-------
                        pls		63084
                        gl		41712
                        hf		39946
                        plz		26900
                        ty		26191
                        nice		26087
                        please		26005
                        glhf		19296
                        thx		16672
                        sorry		15812
                        thanks		8638
                        sry		5498
                        
                        
                        
                        How toxic are the games? Actually, not as much as I might have worried..
                        
                        WORD		COUNT
                        ----------	-------
                        fuck		21997
                        shit		19482
                        fucking		16963
                        frustrating	16911
                        fucked		6089
                        ffs		5988
                        damn		5585
                        idiot		5494
                        ass		3068
                        asshole		800
                        
                        
                        
                        What about issues in the game?
                        
                        WORD		COUNT
                        ----------	-------
                        lag		13433
                        re		19892
                        kick		10327
                        afk		5564
                        lagging		5035
                        eject		4792
                        lags		4251
                        
                        
                        
                        How often are the game enders mentioned?
                        
                        WORD		COUNT
                        ----------	-------
                        nuke		28959
                        mavor		13966
                        para		6724
                        paragon		3185
                        scathis		4679
                        yolo		4106
                        novax		1669
                        yolona		1296
                        salvation	1047
                        
                        
                        
                        And the experimental units?
                        
                        WORD		COUNT
                        ----------	-------
                        spider		6160
                        monkey		4272
                        gc		4031
                        chicken		2145
                        fatboy		2340
                        czar		1874
                        mega		1650
                        fatty		1335
                        tempest		1188
                        ahwassa		1083
                        monkeylord	501
                        megalith	473
                        ythotha		449
                        ripper		384
                        atlantis	379
                        asswasher	348
                        colossus	312
                        soulripper	55
                        
                        
                        Which races get talked about most? Presumably due to asking for engineers to make Hives and Kennels:
                        
                        WORD		COUNT
                        ----------	-------
                        cybran		13614
                        uef		13065
                        aeon		8793
                        sera		5768
                        seraphim	1088 (much faster to just type sera!)
                        
                        
                        
                        How are the commanders referred to?
                        
                        WORD		COUNT
                        ----------	-------
                        com		11785
                        acu		11555
                        
                        
                        Does playing FAF give you headaches? Because ibuprofen is mentioned 187 times.
                        
                        FemboyF 1 Reply Last reply Reply Quote 11
                        • FemboyF Offline
                          Femboy Promotions team @QuietJoy
                          last edited by

                          @scout_more_often Dont apologize! This looks amazing dude! Could even make a graphic out of this data! a FAF interesting Chat Facts sheet!

                          FAF Website Developer

                          1 Reply Last reply Reply Quote 0
                          • maggeM Offline
                            magge Global Moderator
                            last edited by

                            Would be an interesting fun-fact news. The ibuprofen thing is funny.

                            Want to become a Moderator? || Open volunteer positions
                            1 Reply Last reply Reply Quote 0
                            • R Offline
                              RedX
                              last edited by

                              How does one find their user-id?

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post