FAForever Forums
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Login

    Replay vault download

    Scheduled Pinned Locked Moved Contribution
    20 Posts 6 Posters 661 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Brutus5000B Offline
      Brutus5000 FAF Server Admin
      last edited by

      The whole set of all replays (which doesn't really make sense) takes around 500GB, but is not available as a single download.
      The search tools are not available for offline use either.

      "Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
      – Benno Rice

      1 Reply Last reply Reply Quote 0
      • N Offline
        Nooby
        last edited by Nooby

        How are replays searched? Can metadata about the game be easily extracted? Is that metadata currently extracted and put into a database? I am intrested in improving this

        1 Reply Last reply Reply Quote 0
        • BlackYpsB Offline
          BlackYps @Nooby
          last edited by

          @nooby said in Replay vault download:

          It is very slow for me to navigate

          What does very slow mean and what are you trying to do? For the normal replay searches (searching for a player, searching for a map, etc.) it always felt fast enough for me.

          N 1 Reply Last reply Reply Quote 0
          • N Offline
            Nooby @BlackYps
            last edited by

            @blackyps What client are you using? For me I have not had that experience, with it hanging and needed a reFAF to fix. Searching for player, rating, map, number of players with advanced search options

            1 Reply Last reply Reply Quote 0
            • BlackYpsB Offline
              BlackYps
              last edited by

              Typically the latest client, but it shouldn't matter too much, because I don't think there were any client changes to the searches recently. Are you using Player Name "contains" instead of "is"? I just tested different stuff and it was all reasonably fast except that one. The database of all replays is really huge and checking for substrings in the players is really complicated. So if you know the player name you can use "is" and massively speed up the search. If you use the filter search, it already does this for you.
              If you absolutely have to run a complicated query you can limit the time range of the replays to speed it up. You are probably not that interested in games a long time ago anyway.

              N 1 Reply Last reply Reply Quote 0
              • N Offline
                Nooby @BlackYps
                last edited by

                @blackyps Ah, I was using map contains and player contains so that must be why, need to optimsie the query.

                I would still like a way to download every replay with:
                say at least two players
                at least one player over 1000 rating
                no astro craters

                for archival purposes.

                1 Reply Last reply Reply Quote 0
                • KaletheQuickK Offline
                  KaletheQuick
                  last edited by

                  Would it be possible to "cull" the replay vault? Replays are small, but I would wager games with one player in sandbox mode could stand to be removed. Perhaps a separate long term archive with things over a few years old would be helpful.

                  You must deceive the enemy, sometimes your allies, but you must always deceive yourself!

                  1 Reply Last reply Reply Quote 1
                  • Brutus5000B Offline
                    Brutus5000 FAF Server Admin
                    last edited by

                    We discussed that a few times. It's technically very challenging for no real benefit. So while it is technically possible, it won't happen.

                    "Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
                    – Benno Rice

                    1 Reply Last reply Reply Quote 0
                    • MazorNoobM Offline
                      MazorNoob
                      last edited by

                      I wouldn't call it very challenging, it's just some work. We have enough space on the disk to last us a few more years, so it's not an urgent issue.

                      Brutus5000B 1 Reply Last reply Reply Quote 0
                      • N Offline
                        Nooby
                        last edited by Nooby

                        https://replay.faforever.com/15487505

                        so, one could run an incremental wget script over a month or so against https://replay.faforever.com to download them all, rate limited to prevent ddos

                        header
                        {"uid": 15487505, "complete": true, "state": "PLAYING", "featured_mod": "faf", "game_type": "0", "recorder": "Kekomander", "host": "Kekomander", "launched_at": 1633885469.0, "game_end": 1633887433.0, "title": "1.3k pain", "mapname": "scmp_009", "num_players": 8, "teams": {"3": ["ZmeiGorinich", "Kekomander", "PhantomSamurai", "Greedyscoobs"], "2": ["JT_", "AlphaNoob", "DEVOTION", "Nooby"]}, "featured_mod_versions": {"1": 3724, "2": 3724, "3": 3634, "4": 3709, "5": 1, "6": 3724, "8": 1, "9": 1, "11": 3724, "12": 3724, "13": 3724, "14": 3724, "15": 3724, "17": 3677, "18": 3724, "19": 3724, "20": 3724, "21": 3724, "22": 3724}, "version": 2, "compression": "zstd"}

                        so, just parse the header line into your postresql and your good, you got your own offline searchable vault
                        how does it figure out the vistory condition?

                        Suggestions for improvement to search terms

                        search by elapsed game time
                        Serach by map size
                        have a saveable search profile for advances search

                        1 Reply Last reply Reply Quote 0
                        • AskaholicA Offline
                          Askaholic
                          last edited by

                          There are actually 2 different headers. The one added by FAF that you see there, and the original one included in the replay data. The original gpg header will have pretty much all metadata that you could want about the replay including game options, players, mods, etc. To get it though you’ll have to use a parser that knows the binary format of the header. I’ve written one and made a post about it here: https://forum.faforever.com/topic/1551/faf-scfa-replay-parser-library but there are a number of different implementations out there in a variety of languages.

                          N 1 Reply Last reply Reply Quote 1
                          • N Offline
                            Nooby @Askaholic
                            last edited by Nooby

                            @askaholic thank you, this is relevent and cool and something I am going to have a play with.

                            1 Reply Last reply Reply Quote 0
                            • Brutus5000B Offline
                              Brutus5000 FAF Server Admin @MazorNoob
                              last edited by

                              @mazornoob said in Replay vault download:

                              I wouldn't call it very challenging, it's just some work. We have enough space on the disk to last us a few more years, so it's not an urgent issue.

                              I do - in terms of FAF complexity that is what I call very challenging. It's not just moving around files. Just a few things that come up:

                              • Identifying proper criteria (it must catch enough replays to make a difference, but not remove any "important" cases). There are a few dozen different opinions and once you settle with a common understanding you've got to check that your database is holding that data consistently (which it usually doesn't) + filtering them in a way that it doesn't overload the server
                              • What do you do with table entries. You can't delete them, foreign key constraints don't allow that (review, moderation reports, ...). Maybe you can partition them, but again you're playing with fire on a live system in the 2 biggest tables you have. Then you need to make use of indices in the client so that the partitioning actually has an effect. Or you add some more flags or whatever
                              • In case you move them elsewhere make sure they are able for download there. Suddenly you need to resolve urls by business logic (right now it's just a redirect into the weird folder structure)
                              • How do you mount additional/external storage on the server

                              It's a problem that covers almost areas of FAF: Database, API, Client, Server structure. These are the worst.

                              "Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
                              – Benno Rice

                              1 Reply Last reply Reply Quote 1
                              • MazorNoobM Offline
                                MazorNoob
                                last edited by

                                Picking good criteria is hard, sure, but for the database we have an 'is replay available' flag, don't we? We can just flip it and that's it, database entries can stay as they are. There's also no "move elsewhere" question if we choose to delete them 🙂

                                1 Reply Last reply Reply Quote 0
                                • Brutus5000B Offline
                                  Brutus5000 FAF Server Admin
                                  last edited by Brutus5000

                                  The point in discussion was "Perhaps a separate long term archive with things over a few years old would be helpful."
                                  Just dropping them was heavily opposed when discussed a few months ago.

                                  "Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
                                  – Benno Rice

                                  1 Reply Last reply Reply Quote 0
                                  • MazorNoobM Offline
                                    MazorNoob
                                    last edited by

                                    Alright, I misunderstood what the "that" was, sorry.

                                    1 Reply Last reply Reply Quote 0
                                    • N Offline
                                      Nooby
                                      last edited by Nooby

                                      the replay parser that @askaholic linked can extract a buinch of usefull information that could be used for filter querys to add to the dababase - for example if replay deynced, and if so what exact time.

                                      Also for moderation, automatically extracting that chat of every replay to another database

                                      1 Reply Last reply Reply Quote 0
                                      • AskaholicA Offline
                                        Askaholic
                                        last edited by

                                        It can also be done on demand with the OG web based parser https://fafafaf.github.io/. So for moderation purposes there isn’t a need to extract everything ahead of time. That would mostly be a ton of data that nobody looks at.

                                        1 Reply Last reply Reply Quote 0
                                        • N Offline
                                          Nooby
                                          last edited by

                                          I have a python3 wget script that grabs the direct link and saves the replays, incrementally. It is rate limited to stop ddos but if anyone would like it here it is.

                                          replaygrabber.zip

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post